MAPREDUCE-7432. Make manifest committer default on abfs and gcs stores (#5378)

By default, the mapreduce manifest committer is used for jobs working with abfs and gcs.
Hadoop mapreduce will pick this up automatically; for Spark it is a bit complicated: read the docs
to see the steps required.
This commit is contained in:
Steve Loughran 2023-06-27 13:55:20 +01:00 committed by GitHub
parent 56ef05a9ca
commit 0d057e27c3
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -2235,23 +2235,23 @@
</description> </description>
</property> </property>
<!-- not yet enabled by default. <!-- use manifest committer for abfs URLs -->
<property> <property>
<name>mapreduce.outputcommitter.factory.scheme.abfs</name> <name>mapreduce.outputcommitter.factory.scheme.abfs</name>
<value>org.apache.hadoop.fs.azurebfs.commit.AzureManifestCommitterFactory</value> <value>org.apache.hadoop.fs.azurebfs.commit.AzureManifestCommitterFactory</value>
<description> <description>
The default committer factory for ABFS is for the manifest committer with The default committer factory for ABFS is the manifest committer with
abfs-specific tuning. abfs-specific recovery.
</description> </description>
</property> </property>
<!-- use manifest committer for gs URLs -->
<property> <property>
<name>mapreduce.outputcommitter.factory.scheme.gs</name> <name>mapreduce.outputcommitter.factory.scheme.gs</name>
<value>org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterFactory</value> <value>org.apache.hadoop.mapreduce.lib.output.committer.manifest.ManifestCommitterFactory</value>
<description> <description>
The default committer factory for google cloud storage is for the manifest committer. The default committer factory for google cloud storage is the manifest committer.
</description> </description>
</property> </property>
-->
</configuration> </configuration>