Commit Graph

786 Commits

Author SHA1 Message Date
skysiders
36bf54aba0
MAPREDUCE-7375 JobSubmissionFiles don't set right permission after mkdirs (#4237)
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
2023-01-12 13:48:29 -08:00
Ashutosh Gupta
a48e8c9beb
MAPREDUCE-5608. Replace and deprecate mapred.tasktracker.indexcache.mb (#5014)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-11-14 11:07:40 +09:00
slfan1989
04b31d7ecf
MAPREDUCE-7390. Remove WhiteBox in mapreduce module. (#4462)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-11-14 10:45:20 +09:00
Szilard Nemeth
5bb11cecea HADOOP-15327. Upgrade MR ShuffleHandler to use Netty4 #3259. Contributed by Szilard Nemeth. 2022-11-11 09:05:01 +01:00
wangteng13
388f2f182f
document fix for MAPREDUCE-7425 (#5090)
Reviewed-by: Ashutosh Gupta <ashutosh.gupta@st.niituniversity.in>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
2022-11-01 13:34:59 -07:00
PJ Fanning
aac87ffe76
MAPREDUCE-7411: use secure XML parsers in mapreduce modules (#4980)
Lockdown of parsers in hadoop-mapreduce.

Follow-on to HADOOP-18469. Add secure XML parser factories to XMLUtils

Contributed by P J Fanning
2022-10-21 14:02:11 +01:00
Ashutosh Gupta
062c50db6b
MAPREDUCE-7370. Parallelize MultipleOutputs#close call (#4248). Contributed by Ashutosh Gupta.
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
2022-10-06 15:23:05 -07:00
9uapaw
84081a8cae MAPREDUCE-7409. Make shuffle key length configurable. Contributed by Ashutosh Gupta. 2022-08-31 17:32:51 +02:00
Steve Loughran
de37fd37d6
MAPREDUCE-7403. manifest-committer dynamic partitioning support. (#4728)
Declares its compatibility with Spark's dynamic
output partitioning by having the stream capability
"mapreduce.job.committer.dynamic.partitioning"

Requires a Spark release with SPARK-40034, which
does the probing before deciding whether to 
accept/rejecting instantiation with
dynamic partition overwrite set

This feature can be declared as supported by
any other PathOutputCommitter implementations
whose algorithm and destination filesystem
are compatible.

None of the S3A committers are compatible.

The classic FileOutputCommitter is, but it
does not declare itself as such out of our fear
of changing that code. The Spark-side code
will automatically infer compatibility if
the created committer is of that class or
a subclass.

Contributed by Steve Loughran.
2022-08-24 11:18:19 +01:00
slfan1989
977f4b6165
MAPREDUCE-7385. impove JobEndNotifier#httpNotification With recommended methods. (#4403). Contributed by fanshilun.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-08-09 00:59:03 +05:30
skysiders
9fe96238d2
MAPREDUCE-7372 MapReduce set permission too late in copyJar method (#4026). Contributed by Zhang Dongsheng.
Reviewed-by: Steve Loughran <stevel@apache.org>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
2022-07-25 11:38:59 -07:00
Ashutosh Gupta
a432925f74
HADOOP-18321.Fix when to read an additional record from a BZip2 text file split (#4521)
* HADOOP-18321.Fix when to read an additional record from a BZip2 text file split

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.
2022-07-06 10:00:14 +05:30
slfan1989
073b8ea1d5
HADOOP-18284. Remove Unnecessary semicolon ';' (#4422). Contributed by fanshilun. 2022-06-29 15:20:41 +05:30
Christian Bartolomäus
ef36457b53
MAPREDUCE-7389. Fix typo in description of property (#4440). Contributed by Christian Bartolomaus. 2022-06-21 19:24:11 +05:30
Ashutosh Gupta
36c4be819f
MAPREDUCE-7369. Fixed MapReduce tasks timing out when spends more time on MultipleOutputs#close (#4247)
Contributed by Ravuri Sushma sree.

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
2022-06-20 17:01:01 +09:00
Steve Loughran
e199da3fae
HADOOP-17833. Improve Magic Committer performance (#3289)
Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
	created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.  	

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename. 

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.


Contributed by Steve Loughran.
2022-06-17 19:11:35 +01:00
Ashutosh Gupta
9c3330c22f
MAPREDUCE-7377. Remove unused imports in MapReduce project (#4299)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-05-14 01:34:19 +09:00
Ayush Saxena
665ada6d21
MAPREDUCE-7376. AggregateWordCount fetches wrong results. (#4257). Contributed by Ayush Saxena.
Reviewed-by: Steve Loughran <stevel@apache.org>
2022-05-09 22:56:14 +05:30
Steve Loughran
6999acf520
HADOOP-16202. Enhanced openFile(): mapreduce and YARN changes. (#2584/2)
These changes ensure that sequential files are opened with the
right read policy, and split start/end is passed in.

As well as offering opportunities for filesystem clients to
choose fetch/cache/seek policies, the settings ensure that
processing text files on an s3 bucket where the default policy
is "random" will still be processed efficiently.

This commit depends on the associated hadoop-common patch,
which must be committed first.

Contributed by Steve Loughran.

Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94
2022-04-24 17:33:05 +01:00
Steve Loughran
7328c34ba5
MAPREDUCE-7341. Add an intermediate manifest committer for Azure and GCS
This is a mapreduce/spark output committer optimized for
performance and correctness on Azure ADLS Gen 2 storage
(via the abfs connector) and Google Cloud Storage
(via the external gcs connector library).

* It is safe to use with HDFS, however it has not been optimized
for that use.
* It is *not* safe for use with S3, and will fail if an attempt
is made to do so.

Contributed by Steve Loughran

Change-Id: I6f3502e79c578b9fd1a8c1485f826784b5421fca
2022-03-17 11:24:13 +00:00
Viraj Jasani
08c803ea30
MAPREDUCE-7371. DistributedCache alternative APIs should not use DistributedCache APIs internally (#3855) 2022-01-09 00:18:10 +09:00
Stamatis Zampetakis
bface2ac6c
MAPREDUCE-7368. DBOutputFormat.DBRecordWriter#write must throw exception when it fails. (#3671). Contributed by Stamatis Zampetakis.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2021-12-08 16:40:11 +05:30
Viraj Jasani
215388beea
HADOOP-18022. Add restrict-imports-enforcer-rule for Guava Preconditions and remove remaining usages (#3712)
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2021-11-29 17:37:30 +09:00
Viraj Jasani
b1ad4eab9a
HADOOP-17959. Replace Guava VisibleForTesting by Hadoop's own annotation in hadoop-cloud-storage-project and hadoop-mapreduce-project modules (#3537)
Reviewed-by: Ahmed Hussein <ahussein@apache.org>
2021-10-11 16:22:50 +09:00
Viraj Jasani
207c92753f
MAPREDUCE-7350. Replace Guava Lists usage by Hadoop's own Lists in hadoop-mapreduce-project (#3074) 2021-06-07 11:51:29 +09:00
Ayush Saxena
5404ab4bca
MAPREDUCE-7343. Increase the job name max length in mapred job -list. (#2995). Contributed by Ayush Saxena. 2021-05-14 00:15:33 +05:30
lichaojacobs
663ca14a76
MAPREDUCE-7329: HadoopPipes task may fail when linux kernel version change from 3.x to 4.x (#2775) 2021-04-09 11:58:53 +09:00
Jim Brennan
ede490d131 MAPREDUCE-7325. Intermediate data encryption is broken in LocalJobRunner. Contributed by Ahmed Hussein 2021-03-22 18:41:25 +00:00
Jim Brennan
299b8062f1 MAPREDUCE-7322. revisiting TestMRIntermediateDataEncryption. Contributed by Ahmed Hussein. 2021-03-15 20:13:17 +00:00
Jungtaek Lim
2a38ed0e0c
MAPREDUCE-7317. Add latency information in FileOutputCommitter.mergePaths. (#2624)
Contributed by Jungtaek Lim.
2021-01-27 19:08:08 +00:00
Steve Loughran
9b2956e254
MAPREDUCE-7315. LocatedFileStatusFetcher to collect/publish IOStatistics. (#2579)
Part of the HADOOP-16830 IOStatistics API feature.

If the source FileSystem's listing RemoteIterators
implement IOStatisticsSource, these are collected and served through
the IOStatisticsSource API. If they are not: getIOStatistics() returns
null. 

Only the listing statistics are collected; FileSystem.globStatus() doesn't
provide any, so IO use there is not included in the aggregate results.

Contributed by Steve Loughran.
2020-12-31 16:02:10 +00:00
dengzh
f13c7b1b02
MAPREDUCE-7307. Potential thread leak in LocatedFileStatusFetcher. (#2469)
Contributed by Zhihua Deng.
2020-11-23 15:40:22 +00:00
Peter Bacsko
fb92aa4012 MAPREDUCE-7304. Enhance the map-reduce Job end notifier to be able to notify the given URL via a custom class. Contributed by Zoltan Erdmann 2020-11-20 13:13:51 +01:00
Ayush Saxena
1e3a6efcef
HADOOP-17288. Use shaded guava from thirdparty. (#2342). Contributed by Ayush Saxena. 2020-10-17 12:01:18 +05:30
zz
95dfc875d3
MAPREDUCE-7294. Only application master should upload resource to Yarn Shared Cache (#2223)
Contributed by Zhenzhao Wang <zhenzhaowang@gmail.com>

Signed-off-by: Mingliang Liu <liuml07@apache.org>
2020-09-19 23:10:05 -07:00
ywheel
cf4eb75608
MAPREDUCE-7051. Fix typo in MultipleOutputFormat (#338) 2020-07-30 13:01:22 +09:00
Eric Badger
fbb8775430 Revert "MAPREDUCE-7277. IndexCache totalMemoryUsed differs from cache contents. Contributed by Jon Eagles (jeagles)."
This reverts commit e2322e1117.
2020-06-08 20:35:27 +00:00
Eric E Payne
e2322e1117 MAPREDUCE-7277. IndexCache totalMemoryUsed differs from cache contents. Contributed by Jon Eagles (jeagles). 2020-04-27 19:10:00 +00:00
Surendra Singh Lilhore
a1b0697d37 MAPREDUCE-7199. HsJobsBlock reuse JobACLsManager for checkAccess. Contributed by Bilwa S T 2020-04-18 19:42:20 +05:30
Eric E Payne
11d17417ce MAPREDUCE-7272. TaskAttemptListenerImpl excessive log messages. Contributed by Ahmed Hussein (ahussein) 2020-04-13 18:20:07 +00:00
Jason Lowe
c613296dc8 MAPREDUCE-7241. FileInputFormat listStatus with less memory footprint. Contributed by Zhihua Deng 2020-04-01 07:46:33 -05:00
Wanqiang Ji
ea688631b0
MAPREDUCE-7237. Supports config the shuffle's path cache related parameters (#1397) 2020-03-16 11:28:36 +09:00
Ahmed Hussein
ed302f1fed MAPREDUCE-7208. Tuning TaskRuntimeEstimator. (Ahmed Hussein via jeagles)
Signed-off-by: Jonathan Eagles <jeagles@gmail.com>
2019-11-05 14:55:20 -06:00
Steve Loughran
1921e94292
HADOOP-16458. LocatedFileStatusFetcher.getFileStatuses failing intermittently with S3
Contributed by Steve Loughran.

Includes
-S3A glob scans don't bother trying to resolve symlinks
-stack traces don't get lost in getFileStatuses() when exceptions are wrapped
-debug level logging of what is up in Globber
-Contains HADOOP-13373. Add S3A implementation of FSMainOperationsBaseTest.
-ITestRestrictedReadAccess tests incomplete read access to files.

This adds a builder API for constructing globbers which other stores can use
so that they too can skip symlink resolution when not needed.

Change-Id: I23bcdb2783d6bd77cf168fdc165b1b4b334d91c7
2019-10-01 18:11:05 +01:00
Szilard Nemeth
a7371a779c MAPREDUCE-7225: Fix broken current folder expansion during MR job start. Contributed by Peter Bacsko. 2019-08-01 13:01:30 +02:00
Mehul Garnara
c0a0c353e8
MAPREDUCE-6973. Fix comments on creating _SUCCESS file.
This closes #280

Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2019-07-26 21:21:26 +09:00
Wanqiang Ji
b417a4c854
MAPREDUCE-7214. Remove unused pieces related to mapreduce.job.userlog.retain.hours
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2019-06-11 18:40:35 +09:00
Akira Ajisaka
3ea4f41d9f
MAPREDUCE-6794. Remove unused properties from TTConfig.java 2019-06-07 10:27:41 +09:00
Wanqiang Ji
e7e30a5f8b
MAPREDUCE-7210. Replace mapreduce.job.counters.limit with mapreduce.job.counters.max in mapred-default.xml
This closes #878

Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2019-06-05 10:45:23 +09:00
Akira Ajisaka
5565f2c532
MAPREDUCE-7198. mapreduce.task.timeout=0 configuration used to disable timeout doesn't work. 2019-05-23 10:21:11 +09:00