Commit Graph

908 Commits

Author SHA1 Message Date
Viraj Jasani
53a530aa88 MAPREDUCE-7371. DistributedCache alternative APIs should not use DistributedCache APIs internally (#3855)
Contributed by Viraj Jasani
2022-06-22 13:13:05 +01:00
Steve Loughran
9ca4ac0af0
HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482)
Updating the hadoop version of branch-3.3 to 3.3.9-SNAPSHOT
pending agreement on what number its future release should take.

Using 3.3.9-SNAPSHOT puts space in for other incremental releases,
while avoiding creating JIRA release ordering and autocompletion
confusion the way adding a 3.3.10 or higher version would do.

Contributed by Steve Loughran
2022-06-22 13:09:50 +01:00
Steve Loughran
aeb2a2f860
HADOOP-17833. Improve Magic Committer performance (#3289) (#4470)
Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
        created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.

Contributed by Steve Loughran.
2022-06-21 10:49:37 +01:00
Ashutosh Gupta
4f860f8ac2 MAPREDUCE-7369. Fixed MapReduce tasks timing out when spends more time on MultipleOutputs#close (#4247)
Contributed by Ravuri Sushma sree.

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
(cherry picked from commit 36c4be819f)

 Conflicts:
	hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
2022-06-20 08:02:58 +00:00
Steve Loughran
e123de9f19
HADOOP-16202. Enhanced openFile(): mapreduce and YARN changes. (#2584/2)
These changes ensure that sequential files are opened with the
right read policy, and split start/end is passed in.

As well as offering opportunities for filesystem clients to
choose fetch/cache/seek policies, the settings ensure that
processing text files on an s3 bucket where the default policy
is "random" will still be processed efficiently.

This commit depends on the associated hadoop-common patch,
which must be committed first.

Contributed by Steve Loughran.

Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94
2022-04-27 19:23:25 +01:00
Ashutosh Gupta
f4290055c6 MAPREDUCE-7246. In MapredAppMasterRest#Mapreduce_Application_Master_Info_API, updating the datatype of appId to "string". (#4223)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit fb13c1e4a8)
2022-04-25 14:31:15 +09:00
Steve Loughran
44e662272f
HADOOP-18198. Preparing for 3.3.4 development
Change-Id: I2bf19beb541739af22fced38c2545f09c4e1bd53
2022-04-12 14:09:08 +01:00
Steve Loughran
1cc83f0f45
MAPREDUCE-7341. Add an intermediate manifest committer for Azure and GCS
This is a mapreduce/spark output committer optimized for
performance and correctness on Azure ADLS Gen 2 storage
(via the abfs connector) and Google Cloud Storage
(via the external gcs connector library).

* It is safe to use with HDFS, however it has not been optimized
for that use.
* It is *not* safe for use with S3, and will fail if an attempt
is made to do so.

Contributed by Steve Loughran

Change-Id: I6f3502e79c578b9fd1a8c1485f826784b5421fca
2022-03-17 11:46:41 +00:00
Chao Sun
e079fa6577 Preparing for 3.3.3 development 2021-11-16 16:02:34 -08:00
Chao Sun
9fd0832a99 Revert "MAPREDUCE-7303. Fix TestJobResourceUploader failures after HADOOP-16878. Contributed by Peter Bacsko."
This reverts commit c40f0f1eb3.
2021-09-23 15:04:26 -07:00
lzx404243
d2c02f5afc
MAPREDUCE-7311. Clear filesystem statistics after tests in TestTaskProgressReporter (#2500)
Co-authored-by: Zhengxi Li <zli89@illinois.edu>
(cherry picked from commit 6187f76f11)
2021-09-01 17:15:31 +09:00
Viraj Jasani
4825c7c28d MAPREDUCE-7354. Use empty array constant present in TaskCompletionEvent to avoid creating redundant objects (#3123)
Reviewed-by: Hui Fei <ferhui@apache.org>
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 6e11461eaa)
2021-06-21 16:47:37 +09:00
Wei-Chiu Chuang
fa4915fdbb
Preparing for 3.3.2 development 2021-05-19 21:52:37 +08:00
Eric Badger
930f384e30 MAPREDUCE-7302. Upgrading to JUnit 4.13 causes testcase TestFetcher.testCorruptedIFile() to fail. Contributed by Peter Bacsko. Reviewed by Akira Ajisaka.
(cherry picked from commit da93cd962c)
2021-04-23 18:42:50 +00:00
lichaojacobs
068f114066
MAPREDUCE-7329: HadoopPipes task may fail when linux kernel version change from 3.x to 4.x (#2775)
(cherry picked from commit 663ca14a76)
2021-04-09 12:00:38 +09:00
Surendra Singh Lilhore
e079aaa820 MAPREDUCE-7199. HsJobsBlock reuse JobACLsManager for checkAccess. Contributed by Bilwa S T
(cherry picked from commit a1b0697d37)
2021-04-02 21:31:45 +05:30
Jim Brennan
91d229bf35 MAPREDUCE-7325. Intermediate data encryption is broken in LocalJobRunner. Contributed by Ahmed Hussein
(cherry picked from commit ede490d131)
2021-03-22 18:44:41 +00:00
Jim Brennan
ad74038e02 MAPREDUCE-7322. revisiting TestMRIntermediateDataEncryption. Contributed by Ahmed Hussein.
(cherry picked from commit 299b8062f1)
2021-03-15 20:17:02 +00:00
Jungtaek Lim
ebdacedc83
MAPREDUCE-7317. Add latency information in FileOutputCommitter.mergePaths. (#2624)
Contributed by Jungtaek Lim.

Change-Id: Iaff2f55e5378c22ce8a92ae776f5aba3f0fc304e
2021-01-27 19:08:54 +00:00
Steve Loughran
5be450393c
MAPREDUCE-7315. LocatedFileStatusFetcher to collect/publish IOStatistics. (#2579)
Part of the HADOOP-16830 IOStatistics API feature.

If the source FileSystem's listing RemoteIterators
implement IOStatisticsSource, these are collected and served through
the IOStatisticsSource API. If they are not: getIOStatistics() returns
null.

Only the listing statistics are collected; FileSystem.globStatus() doesn't
provide any, so IO use there is not included in the aggregate results.

Contributed by Steve Loughran.

Change-Id: Iff1485297c2c7e181b54eaf1d2c4f80faeee7cfa
2021-01-14 13:20:38 +00:00
Ayush Saxena
8378ab9f92 HADOOP-17288. Use shaded guava from thirdparty. Contributed by Ayush Saxena. #2505 2020-12-10 05:50:55 +05:30
dengzh
abc87aef18
MAPREDUCE-7307. Potential thread leak in LocatedFileStatusFetcher. (#2469)
Contributed by Zhihua Deng.

Change-Id: Iee62539d02bd8f8a928171d8258e640487050a05
2020-11-23 16:33:41 +00:00
Peter Bacsko
ced08fd87f MAPREDUCE-7304. Enhance the map-reduce Job end notifier to be able to notify the given URL via a custom class. Contributed by Zoltan Erdmann 2020-11-20 13:14:49 +01:00
Akira Ajisaka
c40f0f1eb3
MAPREDUCE-7303. Fix TestJobResourceUploader failures after HADOOP-16878. Contributed by Peter Bacsko.
(cherry picked from commit 7bc305db5d)
2020-10-23 04:41:37 +09:00
zz
e5e91397de
MAPREDUCE-7294. Only application master should upload resource to Yarn Shared Cache (#2223)
Contributed by Zhenzhao Wang <zhenzhaowang@gmail.com>

Signed-off-by: Mingliang Liu <liuml07@apache.org>
2020-09-19 23:26:37 -07:00
ywheel
2efa28cb79
MAPREDUCE-7051. Fix typo in MultipleOutputFormat (#338)
(cherry picked from commit cf4eb75608)
2020-07-30 13:28:35 +09:00
Ahmed Hussein
5969922305 HADOOP-17101. Replace Guava Function with Java8+ Function
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
(cherry picked from commit 98fcffe93f)
2020-07-15 09:57:36 -05:00
Eric Badger
890617c7ac Revert "MAPREDUCE-7277. IndexCache totalMemoryUsed differs from cache contents. Contributed by Jon Eagles (jeagles)."
This reverts commit 741fcf2c63.
2020-06-08 20:25:02 +00:00
Akira Ajisaka
dfa7f160a5
Preparing for 3.3.1 development 2020-04-30 13:33:42 +09:00
Eric E Payne
741fcf2c63 MAPREDUCE-7277. IndexCache totalMemoryUsed differs from cache contents. Contributed by Jon Eagles (jeagles).
(cherry picked from commit e2322e1117)
2020-04-27 19:34:38 +00:00
Eric E Payne
b397a3a875 MAPREDUCE-7272. TaskAttemptListenerImpl excessive log messages. Contributed by Ahmed Hussein (ahussein)
(cherry picked from commit 11d17417ce)
2020-04-13 18:51:00 +00:00
Wanqiang Ji
ea688631b0
MAPREDUCE-7237. Supports config the shuffle's path cache related parameters (#1397) 2020-03-16 11:28:36 +09:00
Sergey Pogorelov
b343e1533b MAPREDUCE-7255. Fix typo in MapReduce documentaion example (#1793) 2020-01-06 12:36:11 +09:00
Ahmed Hussein
ed302f1fed MAPREDUCE-7208. Tuning TaskRuntimeEstimator. (Ahmed Hussein via jeagles)
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
2019-11-05 14:55:20 -06:00
Steve Loughran
1921e94292
HADOOP-16458. LocatedFileStatusFetcher.getFileStatuses failing intermittently with S3
Contributed by Steve Loughran.

Includes
-S3A glob scans don't bother trying to resolve symlinks
-stack traces don't get lost in getFileStatuses() when exceptions are wrapped
-debug level logging of what is up in Globber
-Contains HADOOP-13373. Add S3A implementation of FSMainOperationsBaseTest.
-ITestRestrictedReadAccess tests incomplete read access to files.

This adds a builder API for constructing globbers which other stores can use
so that they too can skip symlink resolution when not needed.

Change-Id: I23bcdb2783d6bd77cf168fdc165b1b4b334d91c7
2019-10-01 18:11:05 +01:00
Daisuke Kobayashi
bc2d3a71d6 HADOOP-16549. Remove Unsupported SSL/TLS Versions from Docs/Properties. Contributed by Daisuke Kobayashi.
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
2019-09-10 10:51:47 +08:00
Szilard Nemeth
ac6c4f0b29 MAPREDUCE-7197. Fix order of actual and expected expression in assert statements. Contributed by Adam Antal 2019-08-12 13:54:28 +02:00
Szilard Nemeth
a7371a779c MAPREDUCE-7225: Fix broken current folder expansion during MR job start. Contributed by Peter Bacsko. 2019-08-01 13:01:30 +02:00
Mehul Garnara
c0a0c353e8
MAPREDUCE-6973. Fix comments on creating _SUCCESS file.
This closes #280

Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2019-07-26 21:21:26 +09:00
Wanqiang Ji
b417a4c854
MAPREDUCE-7214. Remove unused pieces related to mapreduce.job.userlog.retain.hours
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2019-06-11 18:40:35 +09:00
Akira Ajisaka
3ea4f41d9f
MAPREDUCE-6794. Remove unused properties from TTConfig.java 2019-06-07 10:27:41 +09:00
Wanqiang Ji
e7e30a5f8b
MAPREDUCE-7210. Replace mapreduce.job.counters.limit with mapreduce.job.counters.max in mapred-default.xml
This closes #878

Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2019-06-05 10:45:23 +09:00
Akira Ajisaka
afd844059c HADOOP-16331. Fix ASF License check in pom.xml
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2019-05-29 17:25:13 +09:00
Akira Ajisaka
9f933e6446
HADOOP-16323. https everywhere in Maven settings. 2019-05-27 15:24:59 +09:00
Akira Ajisaka
5565f2c532
MAPREDUCE-7198. mapreduce.task.timeout=0 configuration used to disable timeout doesn't work. 2019-05-23 10:21:11 +09:00
Gabor Bota
d7979079ea HADOOP-16210. Update guava to 27.0-jre in hadoop-project trunk. Contributed by Gabor Bota. 2019-04-03 12:59:39 -06:00
David Mollitor
246ab77f28
HADOOP-16196. Path Parameterize Comparable.
Author:    David Mollitor <david.mollitor@cloudera.com>
2019-03-22 10:26:24 +00:00
Steve Loughran
f365957c63
HADOOP-15229. Add FileSystem builder-based openFile() API to match createFile();
S3A to implement S3 Select through this API.

The new openFile() API is asynchronous, and implemented across FileSystem and FileContext.

The MapReduce V2 inputs are moved to this API, and you can actually set must/may
options to pass in.

This is more useful for setting things like s3a seek policy than for S3 select,
as the existing input format/record readers can't handle S3 select output where
the stream is shorter than the file length, and splitting plain text is suboptimal.
Future work is needed there.

In the meantime, any/all filesystem connectors are now free to add their own filesystem-specific
configuration parameters which can be set in jobs and used to set filesystem input stream
options (seek policy, retry, encryption secrets, etc).

Contributed by Steve Loughran
2019-02-05 11:51:02 +00:00
Akira Ajisaka
1129288cf5
HADOOP-14178. Move Mockito up to version 2.23.4. Contributed by Akira Ajisaka and Masatake Iwasaki. 2019-01-29 18:29:56 -08:00
Eric Yang
1ab69a9543 YARN-9221. Added flag to disable dynamic auxiliary service feature.
Contributed by Billie Rinaldi
2019-01-25 19:05:36 -05:00