Commit Graph

5423 Commits

Author SHA1 Message Date
Steve Loughran
1eeb9d9d67
HADOOP-17318. Support concurrent S3A commit jobs with same app attempt ID. (#2399)
See also [SPARK-33402]: Jobs launched in same second have duplicate MapReduce JobIDs

Contributed by Steve Loughran.

Change-Id: Iae65333cddc84692997aae5d902ad8765b45772a
2020-11-26 17:22:56 +00:00
Akira Ajisaka
db04195afd
HADOOP-17394. [JDK 11] Fix error in mvn package -Pdocs (#2488)
Reviewed-by: Takanobu Asanuma <tasanuma@apache.org>
(cherry picked from commit 2ce2198287)
2020-11-26 11:34:39 +09:00
Steve Loughran
1ef34d0819
HADOOP-17313. FileSystem.get to support slow-to-instantiate FS clients. (#2396)
This adds a semaphore to throttle the number of FileSystem instances which
can be created simultaneously, set in "fs.creation.parallel.count".

This is designed to reduce the impact of many threads in an application calling
FileSystem.get() on a filesystem which takes time to instantiate -for example
to an object where HTTPS connections are set up during initialization.
Many threads trying to do this may create spurious delays by conflicting
for access to synchronized blocks, when simply limiting the parallelism
diminishes the conflict, so speeds up all threads trying to access
the store.

The default value, 64, is larger than is likely to deliver any speedup -but
it does mean that there should be no adverse effects from the change.

If a service appears to be blocking on all threads initializing connections to
abfs, s3a or store, try a smaller (possibly significantly smaller) value.

Contributed by Steve Loughran.

Change-Id: I57161b026f28349e339dc8b9d74f6567a62ce196
2020-11-25 14:55:29 +00:00
Eric Payne
8459f1d955 HADOOP-17346. Fair call queue is defeated by abusive service principals. Contributed by Ahmed Hussein (ahussein). 2020-11-23 20:37:33 +00:00
Jim Brennan
e24a6b550e HADOOP-17367. Add InetAddress api to ProxyUsers.authorize (#2449). Contributed by Daryn Sharp and Ahmed Hussein 2020-11-19 21:26:47 +00:00
Steve Loughran
4687c25389 HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2310)
This fixes the S3Guard/Directory Marker Retention integration so that when
fs.s3a.directory.marker.retention=keep, failures during multipart delete
are handled correctly, as are incremental deletes during
directory tree operations.

In both cases, when a directory marker with children is deleted from
S3, the directory entry in S3Guard is not deleted, because it is still
critical to representing the structure of the store.

Contributed by Steve Loughran.

Change-Id: I4ca133a23ea582cd42ec35dbf2dc85b286297d2f
2020-11-18 12:30:43 +00:00
Ahmed Hussein
df4edb99f7 HADOOP-17360. Log the remote address for authentication success (#2441)
Co-authored-by: ahussein <ahmed.hussein@verizonmedia.com>
(cherry picked from commit 1ea3f74246)
2020-11-16 21:48:37 +00:00
Ahmed Hussein
75ca0c0f23 HADOOP-17362. reduce RPC calls doing ls on HAR file (#2444). Contributed by Daryn Sharp and Ahmed Hussein
(cherry picked from commit ebe1d1fbf7)
2020-11-13 21:14:47 +00:00
Ahmed Hussein
23fe3bdab3 HADOOP-17358. Improve excessive reloading of Configurations (#2436)
Co-authored-by: ahussein <ahmed.hussein@verizonmedia.com>
(cherry picked from commit 71071e5c0f)
2020-11-12 10:35:28 -08:00
Doroszlai, Attila
47131cdf7c
HADOOP-17365. Contract test for renaming over existing file is too lenient (#2447)
Contributed by Attila Doroszlai.

Change-Id: I21c29256b52449b7fea335704b3afa02e39c6a39
2020-11-11 21:21:11 +00:00
Stephen Jung
0712505b59 HADOOP-17096. Fix ZStandardCompressor input buffer offset (#2104). Contributed by Stephen Jung (Stripe).
(cherry picked from commit 45434c93e8)
2020-11-10 11:41:21 -08:00
Steve Loughran
7cb5325dda HADOOP-17340. TestLdapGroupsMapping failing -string mismatch in exception validation. (#2427). Contributed by Steve Loughran. 2020-11-07 17:05:23 +05:30
hchaverr
043cca01b1 HDFS-15623. Respect configured values of rpc.engine (#2403) Contributed by Hector Chaverri.
(cherry picked from commit 6eacaffeea)
2020-11-06 14:31:31 -08:00
Eric Badger
c6fee0a2c8 HADOOP-17342. Creating a token identifier should not do kerberos name
resolution. Contributed by Jim Brennan.

(cherry picked from commit af389d9897)
2020-11-05 21:56:46 +00:00
Jim Brennan
41d58d190d Revert "HADOOP-17306. RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09 (#2387)"
This reverts commit e21b81276e.
2020-11-05 17:31:39 +00:00
Wei-Chiu Chuang
cfa0986d00 Revert "HADOOP-17255. JavaKeyStoreProvider fails to create a new key if the keystore is HDFS. (#2291)"
This reverts commit dd1634ec3b.
2020-11-04 16:18:23 -08:00
Akira Ajisaka
dd1634ec3b HADOOP-17255. JavaKeyStoreProvider fails to create a new key if the keystore is HDFS. (#2291)
Reviewed-by: Steve Loughran <stevel@cloudera.com>
Reviewed-by: Wei-Chiu Chuang <weichiu@apache.org>
(cherry picked from commit 7f5caca04c)
2020-11-03 11:22:48 -08:00
Sunil G
91a3d298b9 HADOOP-17329. mvn site commands fails due to MetricsSystemImpl changes. Contributed by Xiaoqiao He.
(cherry picked from commit f17e067d52)
2020-10-29 07:20:46 +05:30
Ayush Saxena
af5f90623c HADOOP-17328. LazyPersist Overwrite fails in direct write mode. (#2413)
(cherry picked from commit 872440610f)
2020-10-27 01:40:25 +09:00
Vinayakumar B
e21b81276e
HADOOP-17306. RawLocalFileSystem's lastModifiedTime() looses milli seconds in JDK < 10.b09 (#2387) 2020-10-23 11:34:14 +05:30
Takanobu Asanuma
0bb1f0df27 HDFS-15639. [JDK 11] Fix Javadoc errors in hadoop-hdfs-client. (#2394)
(cherry picked from commit 30f06e0c74)
2020-10-20 19:12:26 +09:00
Ayush Saxena
54c40cbf49
HADOOP-16878. FileUtil.copy() to throw IOException if the source and destination are the same (#2383)
Contributed by Gabor Bota.
2020-10-17 01:34:01 +05:30
Konstantin V Shvachko
b6423d2780 HDFS-15567. [SBN Read] HDFS should expose msync() API to allow downstream applications call it explicitly. Contributed by Konstantin V Shvachko.
(cherry picked from commit b3786d6c3c)
2020-10-12 17:38:42 -07:00
Jinglun
44ff4c1058
HADOOP-17021. Add concat fs command (#1993)
Contributed by Jinglun

Change-Id: Ia10ad2205ed0f3594c391ee78f7df4c3c31c796d
2020-10-08 10:36:40 +01:00
Mukund Thakur
475dba1ddf
HADOOP-17281 Implement FileSystem.listStatusIterator() in S3AFileSystem (#2354)
Contains HADOOP-17300: FileSystem.DirListingIterator.next() call should
return NoSuchElementException

Contributed by Mukund Thakur

Change-Id: I4e7e5c6e295525db9e2de6f416f32bbb81e146d3
2020-10-07 14:00:23 +01:00
Liang-Chi Hsieh
8f60a90688 HADOOP-17125. Use snappy-java in SnappyCodec (#2297)
This switches the SnappyCodec to use the java-snappy codec, rather than the native one.

To use the codec, snappy-java.jar (from org.xerial.snappy) needs to be on the classpath.

This comesin as an avro dependency,  so it is already on the hadoop-common classpath,
as well as in hadoop-common/lib.
The version used is now managed in the hadoop-project POM; initially 1.1.7.7

Contributed by DB Tsai and Liang-Chi Hsieh

Change-Id: Id52a404a0005480e68917cd17f0a27b7744aea4e
2020-10-06 17:15:17 +01:00
Karen Coppage
43c9959b3a
HADOOP-17267. Add debug-level logs in Filesystem.close() (#2321)
When a filesystem is closed, the FileSystem log will, at debug level,
log the method calling close/closeAll.

At trace level: the full calling stack.

Contributed by Karen Coppage.

Change-Id: I1444f065c171fd31d42b497c92ba4517969f67f0
2020-09-29 16:09:14 +01:00
Hui Fei
ed19f63998
HADOOP-17277. Correct spelling errors for separator (#2322)
Contributed by Hui Fei.

(cherry picked from commit 474fa80bfb)
2020-09-23 15:39:51 +09:00
crossfire
c3cb86ba42
HADOOP-17088. Failed to load XInclude files with relative path. (#2097)
Contributed by Yushi Hayasaka.

Change-Id: I8aad5143c34fb831bef0077f7b659643f8ae073a
2020-09-21 19:13:20 +01:00
David Tucker
75bc54a66d
HADOOP-15136. Correct typos in filesystem.md (#2314)
Contributed by David Tucker

Change-Id: I130e15d4f625a5b1b30967e6cfc1684079dd1f98
2020-09-18 18:31:15 +01:00
Uma Maheswara Rao G
2d9c5395ef HDFS-15578: Fix the rename issues with fallback fs enabled (#2305). Contributed by Uma Maheswara Rao G.
Co-authored-by: Uma Maheswara Rao G <umagangumalla@cloudera.com>
(cherry picked from commit e4cb0d3514)
2020-09-16 23:01:03 -07:00
Uma Maheswara Rao G
1195dac55e HDFS-15529: getChildFilesystems should include fallback fs as well (#2234). Contributed by Uma Maheswara Rao G.
(cherry picked from commit b3660d0147)
2020-09-12 20:48:59 -07:00
Uma Maheswara Rao G
bfa145dd7c HDFS-15532: listFiles on root/InternalDir will fail if fallback root has file. (#2298). Contributed by Uma Maheswara Rao G.
(cherry picked from commit d2779de3f5)
2020-09-12 20:43:34 -07:00
zz
2d5ca83078 HADOOP-15891. provide Regex Based Mount Point In Inode Tree (#2185). Contributed by Zhenzhao Wang.
Co-authored-by: Zhenzhao Wang <zhenzhaowang@gmail.com>
(cherry picked from commit 12a316cdf9)
2020-09-12 20:42:06 -07:00
Steve Loughran
aa80bcb1ec
Revert "HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280)"
This reverts commit 0c82eb0324.

Change-Id: I6bd100d9de19660b0f28ee0ab16faf747d6d9f05
2020-09-11 18:07:05 +01:00
Steve Loughran
0c82eb0324
HADOOP-17244. S3A directory delete tombstones dir markers prematurely. (#2280)
This changes directory tree deletion so that only files are incrementally deleted
from S3Guard after the objects are deleted; the directories are left alone
until metadataStore.deleteSubtree(path) is invoke.

This avoids directory tombstones being added above files/child directories,
which stop the treewalk and delete phase from working.

Also:

* Callback to delete objects splits files and dirs so that
any problems deleting the dirs doesn't trigger s3guard updates
* New statistic to measure #of objects deleted, alongside request count.
* Callback listFilesAndEmptyDirectories renamed listFilesAndDirectoryMarkers
  to clarify behavior.
* Test enhancements to replicate the failure and verify the fix

Contributed by Steve Loughran

Change-Id: I0e6ea2c35e487267033b1664228c8837279a35c7
2020-09-10 17:29:33 +01:00
Steve Loughran
262c575fab
HADOOP-17181. Handle transient stream read failures in FileSystem contract tests (#2286)
Contributed by Steve Loughran.

* Fixes AbstractContractSeekTest test to use readFully
* Doesn't do this to AbstractContractUnbufferTest test as it changes the test too much.
Instead just notes in the error that this may be transient

The issue is that read(buffer) doesn't guarantee that the buffer is filled, only that it will
read up to a point, and that may be just be the amount of data left in the TCP packet.
readFully corrects for this, but using it in the unbuffer test runs the risk that what
is tested for in terms of unbuffering doesn't actually get validated.

Change-Id: I046eadb69b80ba0aac468b354c82c4d510dc3699
2020-09-09 12:01:47 +01:00
Steve Loughran
1b9109d237
HDFS-15471. TestHDFSContractMultipartUploader failing (#2252)
Contributed by Steve Loughran
(Was: broken by Steve Loughran)

Change-Id: If6a82706f3ea6d802bc6da03c2a2ca734e30388f
2020-08-28 15:47:06 +01:00
sguggilam
fcb80c1ade
HADOOP-17159. Make UGI support forceful relogin from keytab ignoring the last login time (#2249)
Contributed by Sandeep Guggilam.

Signed-off-by: Mingliang Liu <liuml07@apache.org>
Signed-off-by: Steve Loughran <stevel@apache.org>
2020-08-26 23:49:31 -07:00
Mingliang Liu
ee7d214118
Revert "HADOOP-17159 Ability for forceful relogin in UserGroupInformation class (#2197)"
This reverts commit da129a67bb.
2020-08-26 11:22:46 -07:00
Uma Maheswara Rao G
ba0eca6a2c HDFS-15533: Provide DFS API compatible class, but use ViewFileSystemOverloadScheme inside. (#2229). Contributed by Uma Maheswara Rao G.
(cherry picked from commit dd013f2fdf)
2020-08-25 12:00:52 -07:00
sguggilam
da129a67bb
HADOOP-17159 Ability for forceful relogin in UserGroupInformation class (#2197)
Contributed by Sandeep Guggilam.

Signed-off-by: Mingliang Liu <liuml07@apache.org>
Signed-off-by: Steve Loughran <stevel@apache.org>
2020-08-24 23:40:56 -07:00
Joey
ce51048e8c HADOOP-16925. MetricsConfig incorrectly loads the configuration whose value is String list in the properties file (#1896)
Contributed by Jiayi Liu
2020-08-24 14:03:36 +01:00
S O'Donnell
033a8bdc4e HADOOP-17209. Erasure Coding: Native library memory leak. Contriubted by Sean Chow
(cherry picked from commit 17cd8a1b16)
2020-08-24 12:05:37 +01:00
Steve Loughran
49f8ae965e
HADOOP-13230. S3A to optionally retain directory markers.
This adds an option to disable "empty directory" marker deletion,
so avoid throttling and other scale problems.

This feature is *not* backwards compatible.
Consult the documentation and use with care.

Contributed by Steve Loughran.

Change-Id: I69a61e7584dc36e485d5e39ff25b1e3e559a1958
2020-08-15 20:19:49 +01:00
Uma Maheswara Rao G
99b120a06e HDFS-15515: mkdirs on fallback should throw IOE out instead of suppressing and returning false (#2205)
* HDFS-15515: mkdirs on fallback should throw IOE out instead of suppressing and returning false

* Used LambdaTestUtils#intercept in test
2020-08-13 14:32:19 +01:00
Akira Ajisaka
1354400e7c
HADOOP-17204. Fix typo in Hadoop KMS document. Contributed by Xieming Li.
(cherry picked from commit 141c62584b)
2020-08-12 16:09:15 +09:00
Gautham B A
b4a105a209
HADOOP-17196. Fix C/C++ standard warnings (#2208)
* Passing C/C++ standard flags -std is
  not cross-compiler friendly as not all
  compilers support all values.
* Thus, we need to make use of the
  appropriate flags provided by CMake in
  order to specify the C/C++ standards.

Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 909f1e82d3)
2020-08-11 16:35:41 +09:00
sguggilam
97dd1cb57e
HADOOP-17164. UGI loginUserFromKeytab doesn't set the last login time (#2178)
Contributed by Sandeep Guggilam.

Signed-off-by: Mingliang Liu <liuml07@apache.org>
Signed-off-by: Steve Loughran <stevel@apache.org>
2020-08-04 10:31:26 -07:00
Uma Maheswara Rao G
4fe491d10e HDFS-15478: When Empty mount points, we are assigning fallback link to self. But it should not use full URI for target fs. (#2160). Contributed by Uma Maheswara Rao G.
(cherry picked from commit ac9a07b51a)
2020-07-31 01:31:37 -07:00