Commit Graph

23778 Commits

Author SHA1 Message Date
Steve Loughran
56dee66770
HADOOP-16823. Large DeleteObject requests are their own Thundering Herd.
Contributed by Steve Loughran.

During S3A rename() and delete() calls, the list of objects delete is
built up into batches of a thousand and then POSTed in a single large
DeleteObjects request.

But as the IO capacity allowed on an S3 partition may only be 3500 writes
per second *and* each entry in that POST counts as a single write, then
one of those posts alone can trigger throttling on an already loaded
S3 directory tree. Which can trigger backoff and retry, with the same
thousand entry post, and so recreate the exact same problem.

Fixes

* Page size for delete object requests is set in
  fs.s3a.bulk.delete.page.size; the default is 250.
* The property fs.s3a.experimental.aws.s3.throttling (default=true)
  can be set to false to disable throttle retry logic in the AWS
  client SDK -it is all handled in the S3A client. This
  gives more visibility in to when operations are being throttled
* Bulk delete throttling events are logged to the log
  org.apache.hadoop.fs.s3a.throttled log at INFO; if this appears
  often then choose a smaller page size.
* The metric "store_io_throttled" adds the entire count of delete
  requests when a single DeleteObjects request is throttled.
* A new quantile, "store_io_throttle_rate" can track throttling
  load over time.
* DynamoDB metastore throttle resilience issues have also been
  identified and fixed. Note: the fs.s3a.experimental.aws.s3.throttling
  flag does not apply to DDB IO precisely because there may still be
  lurking issues there and it safest to rely on the DynamoDB client
  SDK.

Change-Id: I00f85cdd94fc008864d060533f6bd4870263fd84
2020-02-13 19:09:49 +00:00
Szilard Nemeth
da99ac7e93 YARN-10137. UIv2 build is broken in trunk. Contributed by Adam Antal 2020-02-13 16:31:35 +01:00
Surendra Singh Lilhore
a98352ced1 HDFS-15086. Block scheduled counter never get decremet if the block got deleted before replication. Contributed by hemanthboyina. 2020-02-13 16:57:41 +05:30
Szilard Nemeth
f1b1b332f5 YARN-10029. Add option to UIv2 to get container logs from the new JHS API. Contributed by Adam Antal 2020-02-13 12:08:54 +01:00
Prabhu Joseph
fe7d67a8a2 YARN-9521. Handle FileSystem close in ApiServiceClient
Contributed by kyungwan nam. Reviewed by Eric Yang.
2020-02-13 09:39:13 +05:30
Akira Ajisaka
0ddb5f0881
HDFS-13989. RBF: Add FSCK to the Router (#1832)
Co-authored-by: Inigo Goiri <inigoiri@apache.org>
2020-02-13 10:06:07 +09:00
Ayush Saxena
f09710bbb8 HDFS-15161. When evictableMmapped or evictable size is zero, do not throw NoSuchElementException in ShortCircuitCache#close(). Contributed by Lisheng Sun 2020-02-12 20:29:35 +05:30
Szilard Nemeth
8d6ff87c18 MAPREDUCE-7263. Remove obsolete validateTargetPath() from FrameworkUploader. Contributed by Marton Hudaky 2020-02-12 15:53:33 +01:00
Ayush Saxena
3df0adaaea HDFS-15127. RBF: Do not allow writes when a subcluster is unavailable for HASH_ALL mount points. Contributed by Inigo Goiri 2020-02-12 19:41:04 +05:30
Masatake Iwasaki
749e45dfdb
HADOOP-16856. cmake is missing in the CentOS 8 section of BUILDING.txt. (#1841) 2020-02-12 21:17:33 +09:00
Akira Ajisaka
9709afe67d
HADOOP-16849. start-build-env.sh behaves incorrectly when username is numeric only. Contributed by Jihyun Cho. 2020-02-12 14:06:23 +09:00
Kihwal Lee
9b8a78d97b HDFS-14758. Make lease hard limit configurable and reduce the default.
Contributed by hemanthboyina.
2020-02-11 12:40:00 -06:00
Prabhu Joseph
e637797211 YARN-10127. Remove setting App Ordering Policy to ParentQueue in FSQueueConverter
Contributed by Peter Bacsko.
2020-02-11 22:01:58 +05:30
Stephen O'Donnell
d7c136b9ed HDFS-15150. Introduce read write lock to Datanode. Contributed Stephen O'Donnell.
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
2020-02-11 08:00:15 -08:00
Jan Hentschel
cc8ae59104
HADOOP-16851. Removed unused import in Configuration
Contributed by Jan Hentschel.
2020-02-11 11:51:45 +00:00
testfixer
d36cd37e60
HADOOP-16847. Test can fail if HashSet iterates in a different order.
Contributed by Testfixer
2020-02-11 11:22:07 +00:00
Masatake Iwasaki
d5467d299d HADOOP-16739. Fix native build failure of hadoop-pipes on CentOS 8. 2020-02-10 13:13:11 +09:00
Ayush Saxena
6191d4b4a0 HDFS-15158. The number of failed volumes mismatch with volumeFailures of Datanode metrics. Contributed by Yang Yun. 2020-02-09 23:32:22 +05:30
Sunil G
28f730b317 YARN-10109. Allow stop and convert from leaf to parent queue in a single Mutation API call. Contributed by Prabhu Joseph 2020-02-09 21:14:53 +05:30
Ayush Saxena
3f0a7cd17a YARN-9624. Use switch case for ProtoUtils#convertFromProtoFormat containerState. Contributed by Bilwa S T 2020-02-09 19:14:18 +05:30
Ayush Saxena
d23317b102 HDFS-15115. Namenode crash caused by NPE in BlockPlacementPolicyDefault when dynamically change logger to debug. Contributed by wangzhixiang 2020-02-08 10:33:57 +05:30
Ayush Saxena
23787e4bdd HDFS-15136. LOG flooding in secure mode when Cookies are not set in request header. Contributed by Renukaprasad C 2020-02-08 01:17:59 +05:30
dependabot[bot]
fafe78fea7
Bump checkstyle from 8.26 to 8.29 (#1828)
Bumps [checkstyle](https://github.com/checkstyle/checkstyle) from 8.26 to 8.29.
- [Release notes](https://github.com/checkstyle/checkstyle/releases)
- [Commits](https://github.com/checkstyle/checkstyle/compare/checkstyle-8.26...checkstyle-8.29)

Signed-off-by: dependabot[bot] <support@github.com>
2020-02-07 19:32:10 +09:00
Akira Ajisaka
3ebf505965
HADOOP-16834. Replace com.sun.istack.Nullable with javax.annotation.Nullable in DNS.java. Contributed by Xieming Li. 2020-02-07 19:30:06 +09:00
Vinayakumar B
7dac7e1d13
HADOOP-16596. [pb-upgrade] Use shaded protobuf classes from hadoop-thirdparty dependency (#1635). Contributed by Vinayakumar B. 2020-02-07 14:51:24 +05:30
bilaharith
5944d28130
HADOOP-16825: ITestAzureBlobFileSystemCheckAccess failing.
Contributed by Bilahari T H.
2020-02-06 18:48:00 +00:00
Sneha Vijayarajan
55f2421580
HADOOP-16845: Disable ITestAbfsClient.testContinuationTokenHavingEqualSign due to ADLS Gen2 service bug.
Contributed by Sneha Vijayarajan.
2020-02-06 18:41:06 +00:00
Mukund Thakur
146ca0f545
HADOOP-16832. S3Guard testing doc: Add required parameters for S3Guard testing in IDE. (#1822). Contributed by Mukund Thakur. 2020-02-06 15:13:25 +01:00
Szilard Nemeth
71b2c2ffe9 YARN-10101. Support listing of aggregated logs for containers belonging to an application attempt. Contributed by Adam Antal 2020-02-06 12:25:06 +01:00
Jonathan Hung
314e2f9d2e YARN-10116. Expose diagnostics in RMAppManager summary 2020-02-04 17:44:05 -08:00
Chen Liang
ce7b8b5634 HDFS-15148. dfs.namenode.send.qop.enabled should not apply to primary NN port. Contributed by Chen Liang. 2020-02-04 12:12:35 -08:00
Kihwal Lee
10a60fbe20 HDFS-12491. Support wildcard in CLASSPATH for libhdfs. Contributed by Muhammad Samir Khan. 2020-02-04 12:22:35 -06:00
Stephen O'Donnell
1e3a0b0d93 HDFS-7175. Client-side SocketTimeoutException during Fsck. Contributed by Stephen O'Donnell, Akira Ajisaka.
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
Co-authored-by: Akira Ajisaka <aajisaka@apache.org>
2020-01-31 16:13:02 -08:00
Giovanni Matteo Fumarola
bf8686f43f YARN-8982. [Router] Add locality policy. Contributed by Young Chen. 2020-01-30 16:59:36 -08:00
Szilard Nemeth
a7d72c523a YARN-10099. FS-CS converter: handle allow-undeclared-pools and user-as-default-queue properly and fix misc issues. Contributed by Peter Bacsko 2020-01-30 16:03:38 +01:00
Mustafa İman
5977360878
HADOOP-16801. S3Guard listFiles will not query S3 if all listings are authoritative (#1815). Contributed by Mustafa İman. 2020-01-30 11:16:51 +01:00
Akira Ajisaka
a5ef08b619
YARN-9743. [JDK11] TestTimelineWebServices.testContextFactory fails. (#1824) Contributed by Akira Ajisaka and Kinga Marton. 2020-01-30 14:10:31 +09:00
Kihwal Lee
799d4c1cf4 HDFS-15146. TestBalancerRPCDelay.testBalancerRPCDelay fails
intermittently. Contributed by Ahmed Hussein.
2020-01-29 11:00:27 -06:00
Eric E Payne
b897f6834b MAPREDUCE-7079: JobHistory#ServiceStop implementation is incorrect. Contributed by Ahmed Hussein (ahussein) 2020-01-29 16:54:45 +00:00
Szilard Nemeth
7f3e1e0c07 MAPREDUCE-7260. Cross origin request support for Job history server web UI. Contributed by Adam Antal 2020-01-29 14:42:52 +01:00
Prabhu Joseph
825db8fe2a YARN-10107. Fix GpuResourcePlugin#getNMResourceInfo to honor Auto Discovery Enabled
Contributed by Szilard Nemeth.
2020-01-29 13:30:00 +05:30
Eric Badger
e578e52aae YARN-10084. Allow inheritance of max app lifetime / default app lifetime. Contributed by Eric Payne. 2020-01-29 03:54:43 +00:00
Yufei Gu
1643cfdfbb YARN-10015. Correct the sample command in SLS README file. Contributed by Aihua Xu. 2020-01-28 17:47:49 -08:00
Chen Liang
483397c7f7 [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster. Conntributed by Chen Liang 2020-01-28 15:20:36 -08:00
Chen Liang
3e86807802 Revert "[SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster. Contributed by Chen Liang."
This reverts commit ff8ff0f7e5.
2020-01-28 15:19:47 -08:00
Chen Liang
ff8ff0f7e5 [SBN Read] Slow clients when Observer reads are enabled but there are no Observers on the cluster. Contributed by Chen Liang. 2020-01-28 15:14:58 -08:00
Inigo Goiri
1839c467f6 HDFS-13179. TestLazyPersistReplicaRecovery#testDnRestartWithSavedReplicas fails intermittently. Contributed by Ahmed Hussein. 2020-01-28 10:10:35 -08:00
Inigo Goiri
5abd0148eb YARN-9768. RM Renew Delegation token thread should timeout and retry. Contributed by Manikandan R. 2020-01-28 10:06:37 -08:00
Inigo Goiri
061421fc6d HDFS-15145. HttpFS: getAclStatus() returns permission as null. Contributed by hemanthboyina. 2020-01-28 10:04:38 -08:00
Prabhu Joseph
1ab9c692fa YARN-10022. RM Rest API to validate the CapacityScheduler Configuration change
Contributed by Kinga Marton.
2020-01-28 23:16:04 +05:30