hadoop/hadoop-tools
Mukund Thakur 47be1ab3b6
HADOOP-18679. Add API for bulk/paged delete of files (#6726)
Applications can create a BulkDelete instance from a
BulkDeleteSource; the BulkDelete interface provides
the pageSize(): the maximum number of entries which can be
deleted, and a bulkDelete(Collection paths)
method which can take a collection up to pageSize() long.

This is optimized for object stores with bulk delete APIs;
the S3A connector will offer the page size of
fs.s3a.bulk.delete.page.size unless bulk delete has
been disabled.

Even with a page size of 1, the S3A implementation is
more efficient than delete(path)
as there are no safety checks for the path being a directory
or probes for the need to recreate directories.

The interface BulkDeleteSource is implemented by
all FileSystem implementations, with a page size
of 1 and mapped to delete(pathToDelete, false).
This means that callers do not need to have special
case handling for object stores versus classic filesystems.

To aid use through reflection APIs, the class
org.apache.hadoop.io.wrappedio.WrappedIO
has been created with "reflection friendly" methods.

Contributed by Mukund Thakur and Steve Loughran
2024-05-20 17:05:25 +01:00
..
hadoop-aliyun Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-archive-logs Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-archives HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-aws HADOOP-18679. Add API for bulk/paged delete of files (#6726) 2024-05-20 17:05:25 +01:00
hadoop-azure HADOOP-18679. Add API for bulk/paged delete of files (#6726) 2024-05-20 17:05:25 +01:00
hadoop-azure-datalake Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-benchmark Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-compat-bench HADOOP-19085. Compatibility Benchmark over HCFS Implementations 2024-03-17 16:48:29 +08:00
hadoop-datajoin Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-distcp HDFS-17216. Distcp: When handle the small files, the bandwidth parameter will be invalid, fix this bug. (#6138) 2024-03-28 10:31:06 -04:00
hadoop-dynamometer Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-extras HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-federation-balance Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-fs2img HADOOP-19041. Use StandardCharsets in more places (#6449) 2024-03-28 23:17:18 -04:00
hadoop-gridmix HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-kafka Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-openstack Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-pipes Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-resourceestimator Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-rumen Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-sls Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
hadoop-streaming HADOOP-19024. Use bouncycastle jdk18 1.77 (#6410). Contributed 2024-03-30 19:58:12 +05:30
hadoop-tools-dist Preparing for 3.5.0 development (#6411) 2024-01-19 15:05:22 +08:00
pom.xml HADOOP-19085. Compatibility Benchmark over HCFS Implementations 2024-03-17 16:48:29 +08:00