hadoop/hadoop-tools
Steve Loughran e199da3fae
HADOOP-17833. Improve Magic Committer performance (#3289)
Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
	created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.  	

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename. 

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.


Contributed by Steve Loughran.
2022-06-17 19:11:35 +01:00
..
hadoop-aliyun HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-archive-logs HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-archives HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-aws HADOOP-17833. Improve Magic Committer performance (#3289) 2022-06-17 19:11:35 +01:00
hadoop-azure HADOOP-16202. Enhanced openFile(): hadoop-azure changes. (#2584/4) 2022-04-24 17:33:05 +01:00
hadoop-azure-datalake HDFS-16453. Upgrade okhttp from 2.7.5 to 4.9.3 (#4229) 2022-05-21 02:53:14 +09:00
hadoop-datajoin HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-distcp HADOOP-18269. Misleading method name in DistCpOptions.(#4216) 2022-05-30 14:02:47 +01:00
hadoop-dynamometer HDFS-16522. Set Http and Ipc ports for Datanodes in MiniDFSCluster (#4108) 2022-04-06 18:17:02 +09:00
hadoop-extras HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-federation-balance HDFS-16256. Minor fix in HDFS Fedbalance document (#4192) 2022-05-02 08:08:12 +08:00
hadoop-fs2img HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-gridmix HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-kafka HADOOP-17753. Keep restrict-imports-enforcer-rule for Guava Lists in top level hadoop-main pom (#3087) 2021-06-11 12:15:52 +09:00
hadoop-openstack HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-pipes Preparing for 3.4.0 development 2020-03-29 23:24:25 +05:30
hadoop-resourceestimator HADOOP-15983. Use jersey-json that is built to use jackson2 (#3988) 2022-04-28 14:18:19 +09:00
hadoop-rumen HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-sls YARN-11102. Fix spotbugs error in hadoop-sls module. Contributed by Szilard Nemeth, Andras Gyori. 2022-04-01 18:24:37 +02:00
hadoop-streaming HADOOP-16202. Enhanced openFile(): mapreduce and YARN changes. (#2584/2) 2022-04-24 17:33:05 +01:00
hadoop-tools-dist HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
pom.xml HDFS-15346. FedBalance tool implementation. Contributed by Jinglun. 2020-06-18 13:33:25 +08:00