hadoop/hadoop-tools/hadoop-aws
Steve Loughran 7de1ac0547
HADOOP-16798. S3A Committer thread pool shutdown problems. (#1963)
Contributed by Steve Loughran.

Fixes a condition which can cause job commit to fail if a task was
aborted < 60s before the job commit commenced: the task abort
will shut down the thread pool with a hard exit after 60s; the
job commit POST requests would be scheduled through the same pool,
so be interrupted and fail. At present the access is synchronized,
but presumably the executor shutdown code is calling wait() and releasing
locks.

Task abort is triggered from the AM when task attempts succeed but
there are still active speculative task attempts running. Thus it
only surfaces when speculation is enabled and the final tasks are
speculating, which, given they are the stragglers, is not unheard of.

Note: this problem has never been seen in production; it has surfaced
in the hadoop-aws tests on a heavily overloaded desktop

Change-Id: I3b433356d01fcc50d88b4353dbca018484984bc8
2020-06-30 10:52:56 +01:00
..
dev-support HADOOP-16697. Tune/audit S3A authoritative mode. 2020-01-10 11:11:56 +00:00
src HADOOP-16798. S3A Committer thread pool shutdown problems. (#1963) 2020-06-30 10:52:56 +01:00
pom.xml Preparing for 3.3.1 development 2020-04-30 13:33:42 +09:00