This
1. changes the default value of fs.s3a.directory.marker.retention
to "keep"
2. no longer prints a message when an S3A FS instances is
instantiated with any option other than delete.
Switching to marker retention improves performance
on any S3 bucket as there are no needless marker DELETE requests
-leading to a reduction in write IOPS and and any delays waiting
for the DELETE call to finish.
There are *very* significant improvements on versioned buckets,
where tombstone markers slow down LIST operations: the more
tombstones there are, the worse query planning gets.
Having versioning enabled on production stores is the foundation
of any data protection strategy, so this has tangible benefits
in production.
It is *not* compatible with older hadoop releases; specifically
- Hadoop branch 2 < 2.10.2
- Any release of Hadoop 3.0.x and Hadoop 3.1.x
- Hadoop 3.2.0 and 3.2.1
- Hadoop 3.3.0
Incompatible releases have no problems reading data in stores
where markers are retained, but can get confused when deleting
or renaming directories.
If you are still using older versions to write to data, and cannot
yet upgrade, switch the option back to "delete"
Contributed by Steve Loughran