7bb09f1010
This 1. changes the default value of fs.s3a.directory.marker.retention to "keep" 2. no longer prints a message when an S3A FS instances is instantiated with any option other than delete. Switching to marker retention improves performance on any S3 bucket as there are no needless marker DELETE requests -leading to a reduction in write IOPS and and any delays waiting for the DELETE call to finish. There are *very* significant improvements on versioned buckets, where tombstone markers slow down LIST operations: the more tombstones there are, the worse query planning gets. Having versioning enabled on production stores is the foundation of any data protection strategy, so this has tangible benefits in production. It is *not* compatible with older hadoop releases; specifically - Hadoop branch 2 < 2.10.2 - Any release of Hadoop 3.0.x and Hadoop 3.1.x - Hadoop 3.2.0 and 3.2.1 - Hadoop 3.3.0 Incompatible releases have no problems reading data in stores where markers are retained, but can get confused when deleting or renaming directories. If you are still using older versions to write to data, and cannot yet upgrade, switch the option back to "delete" Contributed by Steve Loughran |
||
---|---|---|
.. | ||
hadoop-aliyun | ||
hadoop-archive-logs | ||
hadoop-archives | ||
hadoop-aws | ||
hadoop-azure | ||
hadoop-azure-datalake | ||
hadoop-benchmark | ||
hadoop-datajoin | ||
hadoop-distcp | ||
hadoop-dynamometer | ||
hadoop-extras | ||
hadoop-federation-balance | ||
hadoop-fs2img | ||
hadoop-gridmix | ||
hadoop-kafka | ||
hadoop-openstack | ||
hadoop-pipes | ||
hadoop-resourceestimator | ||
hadoop-rumen | ||
hadoop-sls | ||
hadoop-streaming | ||
hadoop-tools-dist | ||
pom.xml |