6574f27fa3
Contributed by Steve Loughran. This addresses two scale issues which has surfaced in large scale benchmarks of the S3A Committers. * Thread pools are not cleaned up. This now happens, with tests. * OOM on job commit for jobs with many thousands of tasks, each generating tens of (very large) files. Instead of loading all pending commits into memory as a single list, the list of files to load is the sole list which is passed around; .pendingset files are loaded and processed in isolation -and reloaded if necessary for any abort/rollback operation. The parallel commit/abort/revert operations now work at the .pendingset level, rather than that of individual pending commit files. The existing parallelized Tasks API is still used to commit those files, but with a null thread pool, so as to serialize the operations. Change-Id: I5c8240cd31800eaa83d112358770ca0eb2bca797 |
||
---|---|---|
.. | ||
hadoop-aliyun | ||
hadoop-archive-logs | ||
hadoop-archives | ||
hadoop-aws | ||
hadoop-azure | ||
hadoop-azure-datalake | ||
hadoop-datajoin | ||
hadoop-distcp | ||
hadoop-dynamometer | ||
hadoop-extras | ||
hadoop-fs2img | ||
hadoop-gridmix | ||
hadoop-kafka | ||
hadoop-openstack | ||
hadoop-pipes | ||
hadoop-resourceestimator | ||
hadoop-rumen | ||
hadoop-sls | ||
hadoop-streaming | ||
hadoop-tools-dist | ||
pom.xml |