2086 Commits

Author SHA1 Message Date
Dongjoon Hyun
20d073cb2c
HADOOP-18718. Fix several maven build warnings (#5592). Contributed by Dongjoon Hyun.
Reviewed-by: Gautham B A <gautham.bangalore@gmail.com>
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
(cherry picked from commit fb16e00da0e7c50cba8bf238fad7d09281717848)

Conflicts:
	hadoop-tools/hadoop-federation-balance/pom.xml
2023-06-12 10:40:41 -07:00
Steve Loughran
936e9e15d0
MAPREDUCE-7435. Manifest Committer OOM on abfs (#5519)
This modifies the manifest committer so that the list of files
to rename is passed between stages as a file of
writeable entries on the local filesystem.

The map of directories to create is still passed in memory;
this map is built across all tasks, so even if many tasks
created files, if they all write into the same set of directories
the memory needed is O(directories) with the
task count not a factor.

The _SUCCESS file reports on heap size through gauges.
This should give a warning if there are problems.

Contributed by Steve Loughran
2023-06-12 13:43:43 +01:00
Steve Loughran
ab594ec77e
HADOOP-18724. Open file fails with NumberFormatException for S3AFileSystem (#5611)
This:

1. Adds optLong, optDouble, mustLong and mustDouble
   methods to the FSBuilder interface to let callers explicitly
   passin long and double arguments.
2. The opt() and must() builder calls which take float/double values
   now only set long values instead, so as to avoid problems
   related to overloaded methods resulting in a ".0" being appended
   to a long value.
3. All of the relevant opt/must calls in the hadoop codebase move to
   the new methods
4. And the s3a code is resilient to parse errors in is numeric options
   -it will downgrade to the default.

This is nominally incompatible, but the floating-point builder methods
were never used: nothing currently expects floating point numbers.

For anyone who wants to safely set numeric builder options across all compatible
releases, convert the number to a string and then use the opt(String, String)
and must(String, String) methods.

Contributed by Steve Loughran
2023-05-16 13:41:17 +01:00
Wei-Chiu Chuang
99312bdfdb
HADOOP-18671. Add recoverLease(), setSafeMode(), isFileClosed() as interfaces to hadoop-common (#5553) (#5619)
* HADOOP-18671. Add recoverLease(), setSafeMode(), isFileClosed() as interfaces to hadoop-common (#5553)

The HDFS lease APIs have been replicated as interfaces in hadoop-common so other filesystems can
also implement them.  Applications which use the leasing APIs should migrate to the new
interface where possible.

Contributed by Stephen Wu

(cherry picked from commit 0e46388474f52cd38578d946c26ba762be05ef1f)

 Conflicts:
	hadoop-hdfs-project/hadoop-hdfs-client/src/main/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
	hadoop-hdfs-project/hadoop-hdfs-rbf/src/test/java/org/apache/hadoop/hdfs/server/federation/router/TestRouterRpc.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestDFSUpgrade.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestEncryptionZones.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/TestViewDistributedFileSystem.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithAcl.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestFSImageWithSnapshot.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/TestNameNodeRetryCacheMetrics.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestFSImageWithOrderedSnapshotDeletion.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestOrderedSnapshotDeletion.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/server/namenode/snapshot/TestSnapshotDeletion.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewer.java
	hadoop-hdfs-project/hadoop-hdfs/src/test/java/org/apache/hadoop/hdfs/tools/offlineImageViewer/TestOfflineImageViewerForErasureCodingPolicy.java

Co-authored-by: Tak Lon (Stephen) Wu <taklwu@apache.org>
2023-05-09 06:20:56 +08:00
Sebastian Baunsgaard
919c3f615b
HADOOP-18660. Filesystem Spelling Mistake (#5475).
Contributed by Sebastian Baunsgaard.

Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2023-04-25 19:59:54 +01:00
Steve Loughran
0b56be3ca4
MAPREDUCE-7437. MR Fetcher class to use an AtomicInteger to generate IDs. (#5579)
...as until now it wasn't thread safe

Contributed by Steve Loughran
2023-04-25 19:56:18 +01:00
skysiders
eef2fdcc29 MAPREDUCE-7375 JobSubmissionFiles don't set right permission after mkdirs (#4237)
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
(cherry picked from commit 36bf54aba0fefa0f3e94d94f836ab054d31ec5c9)
2023-01-12 21:49:15 +00:00
Ashutosh Gupta
7b84f6458b
HADOOP-18484. Upgrade hsqldb to v2.7.1 to mitigate CVE-2022-41853 (#5101) 2022-11-04 11:00:17 +01:00
wangteng13
4da1cad680 document fix for MAPREDUCE-7425 (#5090)
Reviewed-by: Ashutosh Gupta <ashutosh.gupta@st.niituniversity.in>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
(cherry picked from commit 388f2f182f1f21ca0e3931db8b125f327f77880b)
2022-11-01 20:36:17 +00:00
PJ Fanning
bd276092b0 MAPREDUCE-7411: use secure XML parsers in mapreduce modules (#4980)
Lockdown of parsers in hadoop-mapreduce.

Follow-on to HADOOP-18469. Add secure XML parser factories to XMLUtils

Contributed by P J Fanning
2022-10-26 11:04:29 +01:00
Ashutosh Gupta
725cd90712 MAPREDUCE-7370. Parallelize MultipleOutputs#close call (#4248). Contributed by Ashutosh Gupta.
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
(cherry picked from commit 062c50db6bebfabae38aec9e17be2483a11c3f7f)
2022-10-06 23:14:38 +00:00
Ashutosh Gupta
3af155ceeb HADOOP-18400. Fix file split duplicating records from a succeeding split when reading BZip2 text files (#4732)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 30c36ef25a335bc123fdae90b3366e582ad1b37a)
2022-09-19 13:45:47 +09:00
sreeb-msft
5f3bc4340e
HADOOP-18408. ABFS: ITestAbfsManifestCommitProtocol fails on nonHNS configuration (#4758)
ITestAbfsManifestCommitProtocol  to set requireRenameResilience to false for nonHNS configuration

Contributed by Sree Bhattacharyya
2022-09-02 12:34:43 +01:00
Steve Loughran
1168abc704
MAPREDUCE-7403. manifest-committer dynamic partitioning support. (#4728)
Declares its compatibility with Spark's dynamic
output partitioning by having the stream capability
"mapreduce.job.committer.dynamic.partitioning"

Requires a Spark release with SPARK-40034, which
does the probing before deciding whether to
accept/rejecting instantiation with
dynamic partition overwrite set

This feature can be declared as supported by
any other PathOutputCommitter implementations
whose algorithm and destination filesystem
are compatible.

None of the S3A committers are compatible.

The classic FileOutputCommitter is, but it
does not declare itself as such out of our fear
of changing that code. The Spark-side code
will automatically infer compatibility if
the created committer is of that class or
a subclass.

Contributed by Steve Loughran.
2022-08-24 11:19:05 +01:00
Ashutosh Gupta
29ea8ceb49 HADOOP-18390. Fix out of sync import for HADOOP-18321 (#4694)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit bd0f9a46e19c5138865d7e1bfded59b5673b615f)
2022-08-07 16:06:09 +09:00
Ashutosh Gupta
3c339a11ec HADOOP-18321.Fix when to read an additional record from a BZip2 text file split (#4521)
* HADOOP-18321.Fix when to read an additional record from a BZip2 text file split

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com> and Reviewed by Akira Ajisaka.
(cherry picked from commit a432925f74b93d05b4dfdd1831bfbabbf4466a80)
2022-08-06 21:53:48 +09:00
skysiders
1d2a60f623 MAPREDUCE-7372 MapReduce set permission too late in copyJar method (#4026). Contributed by Zhang Dongsheng.
Reviewed-by: Steve Loughran <stevel@apache.org>
Signed-off-by: Chris Nauroth <cnauroth@apache.org>
(cherry picked from commit 9fe96238d2cf9f32cd36888098bccc5a4cfe1723)
2022-07-25 18:39:48 +00:00
PJ Fanning
6733ba56b8
HADOOP-18332. Remove rs-api dependency by downgrading jackson to 2.12.7. (#4552)
This downgrades jackson from the version switched to in 
HADOOP-18033 (2.13.0), to Jackson 2.12.7.
This removes the dependency on javax.ws.rs-api,
so avoiding runtime problems with applications using
jersey-core v1 and/or jsr311-api.

The 2.12.7 release still contains the fix for CVE-2020-36518.

Contributed by PJ Fanning
2022-07-16 18:18:52 +01:00
Steve Loughran
fb4e8172a0
MAPREDUCE-7391. TestLocalDistributedCacheManager failing after HADOOP-16202 (#4472)
Fixing a mockito-based test which broke when HADOOP-16202
changed the methods being invoked.

Contributed by Steve Loughran
2022-06-22 13:13:24 +01:00
Viraj Jasani
53a530aa88 MAPREDUCE-7371. DistributedCache alternative APIs should not use DistributedCache APIs internally (#3855)
Contributed by Viraj Jasani
2022-06-22 13:13:05 +01:00
Steve Loughran
9ca4ac0af0
HADOOP-18305. Preparing for 3.3.4 release: branch-3.3 version => 3.3.9 (#4482)
Updating the hadoop version of branch-3.3 to 3.3.9-SNAPSHOT
pending agreement on what number its future release should take.

Using 3.3.9-SNAPSHOT puts space in for other incremental releases,
while avoiding creating JIRA release ordering and autocompletion
confusion the way adding a 3.3.10 or higher version would do.

Contributed by Steve Loughran
2022-06-22 13:09:50 +01:00
Steve Loughran
aeb2a2f860
HADOOP-17833. Improve Magic Committer performance (#3289) (#4470)
Speed up the magic committer with key changes being

* Writes under __magic always retain directory markers

* File creation under __magic skips all overwrite checks,
  including the LIST call intended to stop files being
        created over dirs.
* mkdirs under __magic probes the path for existence
  but does not look any further.

Extra parallelism in task and job commit directory scanning
Use of createFile and openFile with parameters which all for
HEAD checks to be skipped.

The committer can write the summary _SUCCESS file to the path
`fs.s3a.committer.summary.report.directory`, which can be in a
different file system/bucket if desired, using the job id as
the filename.

Also: HADOOP-15460. S3A FS to add `fs.s3a.create.performance`

Application code can set the createFile() option
fs.s3a.create.performance to true to disable the same
safety checks when writing under magic directories.
Use with care.

The createFile option prefix `fs.s3a.create.header.`
can be used to add custom headers to S3 objects when
created.

Contributed by Steve Loughran.
2022-06-21 10:49:37 +01:00
Ashutosh Gupta
4f860f8ac2 MAPREDUCE-7369. Fixed MapReduce tasks timing out when spends more time on MultipleOutputs#close (#4247)
Contributed by Ravuri Sushma sree.

Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
(cherry picked from commit 36c4be819ff9de471ff29c8affdf22fa4bf225bc)

 Conflicts:
	hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-app/src/main/java/org/apache/hadoop/mapred/TaskAttemptListenerImpl.java
2022-06-20 08:02:58 +00:00
slfan1989
43f4a0e92d
MAPREDUCE-7387. Fix TestJHSSecurity#testDelegationToken AssertionError due to HDFS-16563 (#4428). Contributed by fanshilun.
Signed-off-by: Ayush Saxena <ayushsaxena@apache.org>
2022-06-20 12:16:33 +05:30
Steve Loughran
e123de9f19
HADOOP-16202. Enhanced openFile(): mapreduce and YARN changes. (#2584/2)
These changes ensure that sequential files are opened with the
right read policy, and split start/end is passed in.

As well as offering opportunities for filesystem clients to
choose fetch/cache/seek policies, the settings ensure that
processing text files on an s3 bucket where the default policy
is "random" will still be processed efficiently.

This commit depends on the associated hadoop-common patch,
which must be committed first.

Contributed by Steve Loughran.

Change-Id: Ic6713fd752441cf42ebe8739d05c2293a5db9f94
2022-04-27 19:23:25 +01:00
Viraj Jasani
bb13e228bc
HADOOP-17956. Replace all default Charset usage with UTF-8 (#3529)
Change-Id: I0094a84619ce19acf340d8dd1040cfe9bd88184e
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2022-04-27 10:30:07 +01:00
Ashutosh Gupta
f4290055c6 MAPREDUCE-7246. In MapredAppMasterRest#Mapreduce_Application_Master_Info_API, updating the datatype of appId to "string". (#4223)
Co-authored-by: Ashutosh Gupta <ashugpt@amazon.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit fb13c1e4a8a945328781fa533b5bf22786b830f1)
2022-04-25 14:31:15 +09:00
Steve Loughran
44e662272f
HADOOP-18198. Preparing for 3.3.4 development
Change-Id: I2bf19beb541739af22fced38c2545f09c4e1bd53
2022-04-12 14:09:08 +01:00
Masatake Iwasaki
160b6d106d
HADOOP-18088. Replace log4j 1.x with reload4j. (#4052)
Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
2022-04-07 08:33:13 +09:00
Kengo Seki
85843f2158 MAPREDUCE-7373. Building MapReduce NativeTask fails on Fedora 34+ (#4120)
(cherry picked from commit dc4a680da8bcacf152cc8638d86dd171a7901245)
2022-03-30 13:49:45 +00:00
Steve Loughran
1cc83f0f45
MAPREDUCE-7341. Add an intermediate manifest committer for Azure and GCS
This is a mapreduce/spark output committer optimized for
performance and correctness on Azure ADLS Gen 2 storage
(via the abfs connector) and Google Cloud Storage
(via the external gcs connector library).

* It is safe to use with HDFS, however it has not been optimized
for that use.
* It is *not* safe for use with S3, and will fail if an attempt
is made to do so.

Contributed by Steve Loughran

Change-Id: I6f3502e79c578b9fd1a8c1485f826784b5421fca
2022-03-17 11:46:41 +00:00
Viraj Jasani
b0c1158829
HADOOP-18033. Upgrade fasterxml Jackson to 2.13.0 (#3764)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2021-12-13 13:52:44 +09:00
Chao Sun
e079fa6577 Preparing for 3.3.3 development 2021-11-16 16:02:34 -08:00
Viraj Jasani
77ee5a4266
HADOOP-17950. Provide replacement for deprecated APIs of commons-io IOUtils (#3515)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 8071dbb9c6b4a654d5e1e7c8e3b4d2ca1a736d53)
2021-10-07 11:00:19 +09:00
Chao Sun
9fd0832a99 Revert "MAPREDUCE-7303. Fix TestJobResourceUploader failures after HADOOP-16878. Contributed by Peter Bacsko."
This reverts commit c40f0f1eb34df64449472f698d1d68355f109e9b.
2021-09-23 15:04:26 -07:00
lzx404243
d2c02f5afc
MAPREDUCE-7311. Clear filesystem statistics after tests in TestTaskProgressReporter (#2500)
Co-authored-by: Zhengxi Li <zli89@illinois.edu>
(cherry picked from commit 6187f76f11e885023c99382f0fd18f8fea8bcbb6)
2021-09-01 17:15:31 +09:00
lzx404243
4a93ca78f9
MAPREDUCE-7342. Stop RMService in TestClientRedirect.testRedirect() (#2968)
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 7b5be74228cb4a68d11f52ac061829b70a4f3144)
2021-08-30 08:41:46 +09:00
Masatake Iwasaki
3645a13586 HADOOP-14922. Build of Mapreduce Native Task module fails with unknown opcode "bswap". Contributed by Anup Halarnkar.
(cherry picked from commit 0d59500e8cf89e38cd0f0e45553e4557e5358a8b)
2021-08-25 01:54:36 +00:00
jenny
b8a8821735
MAPREDUCE-7258. HistoryServerRest.html#Task_Counters_API, modify the jobTaskCounters's itemName from taskcounterGroup to taskCounterGroup (#1808)
Co-authored-by: chenjuanni <chenjuanni@inspur.com>
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit e31169c8641384fa4b45d8a27c4b7a152c1ea35b)
2021-08-02 15:39:38 +09:00
Eric Payne
e395711164 MAPREDUCE-7353: Mapreduce job fails when NM is stopped. Contributed by Bilwa S T (BilwaST)
(cherry picked from commit 7581413156da396db218e36a966c5749589b31a7)
2021-07-07 20:57:32 +00:00
Jim Brennan
75f8198aa8 YARN-10824. Title not set for JHS and NM webpages. Contributed by Bilwa S T.
(cherry picked from commit 7c7d02edbd6c17ee8ae2c4bf75e87adace059b76)
2021-06-25 20:36:41 +00:00
Viraj Jasani
4825c7c28d MAPREDUCE-7354. Use empty array constant present in TaskCompletionEvent to avoid creating redundant objects (#3123)
Reviewed-by: Hui Fei <ferhui@apache.org>
Reviewed-by: Akira Ajisaka <aajisaka@apache.org>
(cherry picked from commit 6e11461eaa7dfd6e8da5d1882fb8405f983bbcb9)
2021-06-21 16:47:37 +09:00
Akira Ajisaka
e14d00469a
MAPREDUCE-7348. TestFrameworkUploader#testNativeIO fails. (#3053)
Reviewed-by: Hui Fei <ferhui@apache.org>
(cherry picked from commit 8a489ce78e05cffc2de8923dc5ac7361a72250a8)
2021-05-26 15:48:51 +09:00
Wei-Chiu Chuang
86c28f0639
Revert "HADOOP-17669. Backport HADOOP-17079, HADOOP-17505 to branch-3.3 (#2959)"
This reverts commit 4ffe5eb1ddb4aa260134005335a003ed6d270685.
2021-05-24 17:37:18 +08:00
Wei-Chiu Chuang
fa4915fdbb
Preparing for 3.3.2 development 2021-05-19 21:52:37 +08:00
Wei-Chiu Chuang
4ffe5eb1dd
HADOOP-17669. Backport HADOOP-17079, HADOOP-17505 to branch-3.3 (#2959)
* HADOOP-17079. Optimize UGI#getGroups by adding UGI#getGroupsSet.

Co-authored-by: Wei-Chiu Chuang <weichiu@apache.org>
Change-Id: I0f31409923ece24a82dfba4c4610d8a38c52d9fb

* HADOOP-17505. public interface GroupMappingServiceProvider needs default impl for getGroupsSet() (#2661). Contributed by Vinayakumar B.

(cherry picked from commit c4c0683dff577a91ca978939e182ec0fee65b7c3)

Co-authored-by: Xiaoyu Yao <xyao@apache.org>
Co-authored-by: Vinayakumar B <vinayakumarb@apache.org>
2021-05-17 18:57:46 -07:00
Eric Badger
930f384e30 MAPREDUCE-7302. Upgrading to JUnit 4.13 causes testcase TestFetcher.testCorruptedIFile() to fail. Contributed by Peter Bacsko. Reviewed by Akira Ajisaka.
(cherry picked from commit da93cd962c8d19874d09360726628cdde8595a7d)
2021-04-23 18:42:50 +00:00
lichaojacobs
068f114066
MAPREDUCE-7329: HadoopPipes task may fail when linux kernel version change from 3.x to 4.x (#2775)
(cherry picked from commit 663ca14a769bd8fa124c1aff4ac6630491dbb425)
2021-04-09 12:00:38 +09:00
Surendra Singh Lilhore
e079aaa820 MAPREDUCE-7199. HsJobsBlock reuse JobACLsManager for checkAccess. Contributed by Bilwa S T
(cherry picked from commit a1b0697d379d33223ec1a46dfef31d6d226169bb)
2021-04-02 21:31:45 +05:30
Surendra Singh Lilhore
c70f5eb8fa MAPREDUCE-6826. Job fails with InvalidStateTransitonException: Invalid event: JOB_TASK_COMPLETED at SUCCEEDED/COMMITTING. Contributed by Bilwa S T.
(cherry picked from commit d4e36409d40d9f0783234a3b98394962ae0da87e)
2021-03-31 21:35:06 +05:30