hadoop/hadoop-tools
Steve Loughran 7bb09f1010
HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep" (#5689)
This 
1. changes the default value of fs.s3a.directory.marker.retention
   to "keep"
2. no longer prints a message when an S3A FS instances is
   instantiated with any option other than delete.

Switching to marker retention improves performance
on any S3 bucket as there are no needless marker DELETE requests
-leading to a reduction in write IOPS and and any delays waiting
for the DELETE call to finish.

There are *very* significant improvements on versioned buckets,
where tombstone markers slow down LIST operations: the more
tombstones there are, the worse query planning gets.

Having versioning enabled on production stores is the foundation
of any data protection strategy, so this has tangible benefits
in production.

It is *not* compatible with older hadoop releases; specifically
- Hadoop branch 2 < 2.10.2
- Any release of Hadoop 3.0.x and Hadoop 3.1.x
- Hadoop 3.2.0 and 3.2.1
- Hadoop 3.3.0
Incompatible releases have no problems reading data in stores
where markers are retained, but can get confused when deleting
or renaming directories.

If you are still using older versions to write to data, and cannot
yet upgrade, switch the option back to "delete"

Contributed by Steve Loughran
2023-06-08 12:12:29 +01:00
..
hadoop-aliyun HADOOP-18458: AliyunOSSBlockOutputStream to support heap/off-heap buffer before uploading data to OSS (#4912) 2023-03-28 14:27:01 +08:00
hadoop-archive-logs HADOOP-18206 Cleanup the commons-logging references and restrict its usage in future (#5315) 2023-02-14 03:24:06 +08:00
hadoop-archives HADOOP-18548. Hadoop Archive tool (HAR) should acquire delegation tokens from source and destination file systems (#5355) 2023-03-30 07:12:02 +08:00
hadoop-aws HADOOP-18752. Change fs.s3a.directory.marker.retention to "keep" (#5689) 2023-06-08 12:12:29 +01:00
hadoop-azure Revert "HADOOP-18207. Introduce hadoop-logging module (#5503)" 2023-06-05 09:34:40 +05:30
hadoop-azure-datalake HADOOP-18641. Cloud connector dependency and LICENSE fixup. (#5429) 2023-02-28 10:48:54 +00:00
hadoop-benchmark HADOOP-18507. VectorIO FileRange type to support a "reference" field (#5076) 2022-10-31 21:12:13 +00:00
hadoop-datajoin HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-distcp Revert "HADOOP-18207. Introduce hadoop-logging module (#5503)" 2023-06-05 09:34:40 +05:30
hadoop-dynamometer HADOOP-18359. Update commons-cli from 1.2 to 1.5. (#5095). Contributed by Shilun Fan. 2023-05-10 01:42:12 +05:30
hadoop-extras HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-federation-balance HDFS-16256. Minor fix in HDFS Fedbalance document (#4192) 2022-05-02 08:08:12 +08:00
hadoop-fs2img HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-gridmix HADOOP-18131. Upgrade maven enforcer plugin and relevant dependencies (#4000) 2022-03-08 17:27:04 +09:00
hadoop-kafka HADOOP-17753. Keep restrict-imports-enforcer-rule for Guava Lists in top level hadoop-main pom (#3087) 2021-06-11 12:15:52 +09:00
hadoop-openstack HADOOP-18442. Remove openstack support (#4855) 2022-10-06 11:49:38 +01:00
hadoop-pipes
hadoop-resourceestimator HADOOP-15983. Use jersey-json that is built to use jackson2 (#3988) 2022-04-28 14:18:19 +09:00
hadoop-rumen Revert "HADOOP-18207. Introduce hadoop-logging module (#5503)" 2023-06-05 09:34:40 +05:30
hadoop-sls YARN-10680. Revisit try blocks without catch blocks but having finally blocks. Contributed by Susheel Gupta 2022-10-15 21:51:08 +02:00
hadoop-streaming HADOOP-18359. Update commons-cli from 1.2 to 1.5. (#5095). Contributed by Shilun Fan. 2023-05-10 01:42:12 +05:30
hadoop-tools-dist HADOOP-18442. Remove openstack support (#4855) 2022-10-06 11:49:38 +01:00
pom.xml HADOOP-11867. Add a high-performance vectored read API. (#3904) 2022-06-22 17:29:32 +01:00