Mukund Thakur
819159fa06
HDFS-14788. Use dynamic regex filter to ignore copy of source files in Distcp.
...
Contributed by Mukund Thakur.
Change-Id: I781387ddce95ee300c12a160dc9a0f7d602403c3
2020-01-06 19:10:39 +00:00
Steve Loughran
b6dc00f481
HADOOP-16775. DistCp reuses the same temp file within the task for different files.
...
Contributed by Amir Shenavandeh.
This avoids overwrite consistency issues with S3 and other stores -though
given S3's copy operation is O(data), you are still best of using -direct
when distcp-ing to it.
Change-Id: I8dc9f048ad0cc57ff01543b849da1ce4eaadf8c3
2020-01-02 15:36:33 +00:00
aasha
fccccc9703
HDFS-14869 Copy renamed files which are not excluded anymore by filter ( #1530 )
2019-12-06 17:41:25 +05:30
pingsutw
14cd969b6e
HADOOP-16512. [hadoop-tools] Fix order of actual and expected expression in assert statements
...
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2019-10-07 16:38:08 +09:00
Mukund Thakur
51c64b357d
HDFS-13660. DistCp job fails when new data is appended in the file while the DistCp copy job is running
...
This uses the length of the file known at the start of the copy to determine the amount of data to copy.
* If a file is appended to during the copy, the original bytes are copied.
* If a file is truncated during a copy, or the attempt to read the data fails with a truncated stream,
distcp will now fail. Until now these failures were not detected.
Contributed by Mukund Thakur.
Change-Id: I576a49d951fa48d37a45a7e4c82c47488aa8e884
2019-09-24 11:23:24 +01:00
KAI XIE
c765584eb2
HADOOP-16158. DistCp to support checksum validation when copy blocks in parallel ( #919 )
...
* DistCp to support checksum validation when copy blocks in parallel
* address review comments
* add checksums comparison test for combine mode
2019-08-18 18:46:31 -07:00
Ayush Saxena
e60f5e2572
HADOOP-16440. Distcp can not preserve timestamp with -delete option. Contributed by ludun.
2019-07-20 13:11:14 +05:30
Steve Loughran
19a001826f
Revert "HDFS-9913. DistCp to add -useTrash to move deleted files to Trash."
...
Reverting due to test failures if ~/.Trash not present during test setup.
This reverts commit ee3115f488
.
Change-Id: Icbeeb261570b9131ff99d765ac0945c335b26658
2019-07-17 13:13:24 +01:00
Shen Yinjie
ee3115f488
HDFS-9913. DistCp to add -useTrash to move deleted files to Trash.
...
Contributed by Shen Yinjie.
Change-Id: I03ac7d22ab1054f8e5de4aa7552909c734438f4a
2019-07-17 11:50:46 +01:00
Takanobu Asanuma
98d2065643
HDFS-12564. Add the documents of swebhdfs configurations on the client side. Contributed by Takanobu Asanuma.
...
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
2019-06-20 20:17:24 -07:00
Akira Ajisaka
afd844059c
HADOOP-16331. Fix ASF License check in pom.xml
...
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2019-05-29 17:25:13 +09:00
Akira Ajisaka
9f933e6446
HADOOP-16323. https everywhere in Maven settings.
2019-05-27 15:24:59 +09:00
Andrew Olson
c15b3bca86
HADOOP-16294: Enable access to input options by DistCp subclasses.
...
Adding a protected-scope getter for the DistCpOptions, so that a subclass does
not need to save its own copy of the inputOptions supplied to its constructor,
if it wishes to override the createInputFileListing method with logic similar
to the original implementation, i.e. calling CopyListing#buildListing with a path and input options.
Author: Andrew Olson
2019-05-16 16:11:12 +02:00
Giovanni Matteo Fumarola
7a3188d054
HADOOP-16282. Avoid FileStream to improve performance. Contributed by Ayush Saxena.
2019-05-02 12:58:42 -07:00
Masatake Iwasaki
bbdbc7a9a1
HADOOP-14544. DistCp documentation for command line options is misaligned. Contributed by Masatake Iwasaki.
2019-04-12 11:52:18 +09:00
Siyao Meng
ce4bafdf44
HADOOP-16037. DistCp: Document usage of Sync (-diff option) in detail.
...
Contributed by Siyao Meng
2019-03-26 18:42:54 +00:00
Andrew Olson
faba3591d3
HADOOP-16147. Allow CopyListing sequence file keys and values to be more easily customized.
...
Author: Andrew Olson
2019-03-22 10:35:30 +00:00
Ranith Sardar
546c5d70ef
HADOOP-16032. Distcp It should clear sub directory ACL before applying new ACL on.
2019-02-07 21:48:07 +00:00
Andrew Olson
de804e53b9
HADOOP-15281. Distcp to add no-rename copy option.
...
Contributed by Andrew Olson.
2019-02-07 10:07:22 +00:00
Akira Ajisaka
1129288cf5
HADOOP-14178. Move Mockito up to version 2.23.4. Contributed by Akira Ajisaka and Masatake Iwasaki.
2019-01-29 18:29:56 -08:00
Giovanni Matteo Fumarola
fb8932a727
HADOOP-16029. Consecutive StringBuilder.append can be reused. Contributed by Ayush Saxena.
2019-01-11 10:54:49 -08:00
Kai Xie
188bebbe7e
HADOOP-16018. DistCp won't reassemble chunks when blocks per chunk > 0.
...
Contributed by Kai Xie.
2019-01-08 11:57:57 +00:00
Akira Ajisaka
7f78397036
Revert "HADOOP-14556. S3A to support Delegation Tokens."
...
This reverts commit d7152332b3
.
2019-01-08 14:51:30 +09:00
Steve Loughran
d7152332b3
HADOOP-14556. S3A to support Delegation Tokens.
...
Contributed by Steve Loughran.
2019-01-07 13:18:03 +00:00
Arpit Agarwal
914b0cf15f
HADOOP-12558. distcp documentation is woefully out of date. Contributed by Dinesh Chitlangia.
2018-11-15 13:58:13 -08:00
Ted Yu
e2cecb681e
HADOOP-15850. CopyCommitter#concatFileChunks should check that the blocks per chunk is not 0. Contributed by Ted Yu.
...
Signed-off-by: Wei-Chiu Chuang <weichiu@apache.org>
2018-10-19 13:21:06 -07:00
Steve Loughran
e36ae9639f
HADOOP-15831. Include modificationTime in the toString method of CopyListingFileStatus.
...
Contributed by Ted Yu.
2018-10-12 09:59:19 +01:00
Sunil G
58fa96b697
Changed version in trunk to 3.3.0-SNAPSHOT.
2018-10-02 22:41:41 +05:30
Surendra Singh Lilhore
96c4575d73
HDFS-13805. Journal Nodes should allow to format non-empty directories with -force option. Contributed by Surendra Singh Lilhore.
2018-08-24 08:14:57 +05:30
Akira Ajisaka
3e3963b035
HADOOP-15552. Move logging APIs over to slf4j in hadoop-tools - Part2. Contributed by Ian Pickering.
2018-08-16 00:31:59 +09:00
Steve Loughran
ca8b80bf59
HADOOP-15384. distcp numListstatusThreads option doesn't get to -delete scan.
...
Contributed by Steve Loughran.
2018-07-10 10:43:59 +01:00
Akira Ajisaka
2b2399d623
HADOOP-15495. Upgrade commons-lang version to 3.7 in hadoop-common-project and hadoop-tools. Contributed by Takanobu Asanuma.
2018-06-28 14:37:22 +09:00
Xiao Chen
7c9cdad6d0
HDFS-13056. Expose file-level composite CRCs in HDFS which are comparable across different instances/layouts. Contributed by Dennis Huo.
2018-04-10 21:31:48 -07:00
Steve Loughran
1976e0066e
HADOOP-15209. DistCp to eliminate needless deletion of files under already-deleted directories.
...
Contributed by Steve Loughran.
2018-03-15 18:05:14 +00:00
Chris Douglas
45cccadd2e
HDFS-12780. Fix spelling mistake in DistCpUtils.java. Contributed by Jianfei Jiang
2018-03-13 11:08:11 -07:00
Steve Loughran
7ef4d942dd
HADOOP-15273.distcp can't handle remote stores with different checksum algorithms.
...
Contributed by Steve Loughran.
2018-03-08 11:24:06 +00:00
Steve Loughran
3bd6b1fd85
HADOOP-15292. Distcp's use of pread is slowing it down.
...
Contributed by Virajith Jalaparti.
2018-03-08 11:15:46 +00:00
fang zhenyi
4d4dde5112
HADOOP-15223. Replace Collections.EMPTY* with empty* when available
...
Signed-off-by: Akira Ajisaka <aajisaka@apache.org>
2018-02-18 22:19:39 +09:00
Wangda Tan
60f9e60b3b
Preparing for 3.2.0 development
...
Change-Id: I6d0e01f3d665d26573ef2b957add1cf0cddf7938
2018-02-11 11:17:38 +08:00
Anu Engineer
4304fcd5bd
HDFS-12990. Change default NameNode RPC port back to 8020. Contributed by Xiao Chen.
2018-02-06 13:43:45 -08:00
Arpit Agarwal
d4e13a4647
HADOOP-15198. Correct the spelling in CopyFilter.java. Contributed by Mukul Kumar Singh.
2018-02-02 11:37:51 -08:00
Surendra Singh Lilhore
00129c5314
HDFS-12833. Distcp : Update the usage of delete option for dependency with update and overwrite option. Contributed by usharani.
2017-12-12 00:28:02 +05:30
Akira Ajisaka
cc3f3eca40
MAPREDUCE-6999. Fix typo onf in DynamicInputChunk.java. Contributed by fang zhenyi.
2017-11-02 18:32:24 +09:00
Steve Loughran
f36cbc8475
HADOOP-14942. DistCp#cleanup() should check whether jobFS is null.
...
Contributed by Andras Bokor.
2017-10-20 22:27:04 +01:00
ChenSammi
e0b3c644e1
HDFS-12414. Ensure to use CLI command to enable/disable erasure coding policy. Contributed by Sammi Chen
2017-09-14 09:15:29 +08:00
Xiaoyu Yao
63720ef574
HADOOP-14839. DistCp log output should contain copied and deleted files and directories. Contributed by Yiqun Lin.
2017-09-05 23:34:55 -07:00
Andrew Wang
0d419c984f
Preparing for 3.1.0 development
2017-09-01 11:53:48 -07:00
Andrew Wang
f29a0fc288
HDFS-12303. Change default EC cell size to 1MB for better performance. Contributed by Wei Zhou.
2017-08-25 14:14:23 -07:00
Andrew Wang
dd7916d3cd
HDFS-12250. Reduce usage of FsPermissionExtension in unit tests. Contributed by Chris Douglas.
2017-08-17 09:35:36 -07:00
Sean Mackrory
1a1bf6b7d0
HADOOP-13595. Rework hadoop_usage to be broken up by clients/daemons/etc. Contributed by Allen Wittenauer.
2017-08-02 12:25:05 -06:00
Wei-Chiu Chuang
44350fdf49
HADOOP-14557. Document HADOOP-8143 (Change distcp to have -pb on by default). Contributed by Bharat Viswanadham.
2017-07-20 18:23:13 -07:00
Andrew Wang
af2773f609
Updating version for 3.0.0-beta1 development
2017-06-29 17:57:40 -07:00
Jason Lowe
dd65eea74b
HADOOP-8143. Change distcp to have -pb on by default. Contributed by Mithun Radhakrishnan
2017-06-20 09:53:47 -05:00
Andrew Wang
16ad896d5c
Update maven version for 3.0.0-alpha4 development
2017-05-26 14:09:44 -07:00
Sunil G
b6f66b0da1
YARN-6584. Correct license headers in hadoop-common, hdfs, yarn and mapreduce. Contributed by Yeliang Cang.
2017-05-22 14:10:06 +05:30
Yongjun Zhang
b4adc8392c
HADOOP-14407. DistCp - Introduce a configurable copy buffer size. (Omkar Aradhya K S via Yongjun Zhang)
2017-05-18 15:35:22 -07:00
Mingliang Liu
26172a94d6
HADOOP-14267. Make DistCpOptions immutable. Contributed by Mingliang Liu
2017-03-31 20:04:26 -07:00
Yongjun Zhang
bf3fb585aa
HADOOP-11794. Enable distcp to copy blocks in parallel. Contributed by Yongjun Zhang, Wei-Chiu Chuang, Xiao Chen, Rosie Li.
2017-03-30 17:38:56 -07:00
Yongjun Zhang
144f1cf765
Revert "HADOOP-11794. Enable distcp to copy blocks in parallel. Contributed by Yongjun Zhang, Wei-Chiu Chuang, Xiao Chen."
...
This reverts commit 064c8b25ec
.
2017-03-30 17:38:18 -07:00
Yongjun Zhang
064c8b25ec
HADOOP-11794. Enable distcp to copy blocks in parallel. Contributed by Yongjun Zhang, Wei-Chiu Chuang, Xiao Chen.
2017-03-30 17:01:15 -07:00
Wei-Chiu Chuang
8c591b8d19
HDFS-10974. Document replication factor for EC files. Contributed by Yiqun Lin.
2017-03-30 11:16:05 -07:00
Andrew Wang
0e6f8e4bc6
HDFS-10971. Distcp should not copy replication factor if source file is erasure coded. Contributed by Manoj Govindassamy.
2017-03-28 22:14:03 -07:00
Yongjun Zhang
d235dcdf0b
HADOOP-14127. Add log4j configuration to enable logging in hadoop-distcp's tests. (Xiao Chen via Yongjun Zhang)
2017-02-27 20:42:13 -08:00
Andrew Wang
5d8b80ea9b
Preparing for 3.0.0-alpha3 development
2017-01-19 15:50:07 -08:00
Steve Loughran
ed33ce11dd
HADOOP-13496. Include file lengths in Mismatch in length error for distcp. Contributed by Ted Yu
...
(cherry picked from commit 77401bd5fcca5127c9908156971eeec468371f47)
2017-01-19 11:25:40 +00:00
Chris Nauroth
4c8f9e1302
HDFS-9483. Documentation does not cover use of "swebhdfs" as URL scheme for SSL-secured WebHDFS. Contributed by Surendra Singh Lilhore.
2017-01-05 15:04:47 -08:00
Akira Ajisaka
209e805430
HADOOP-13506. Redundant groupid warning in child projects. Contributed by Kai Sasaki.
2016-11-28 14:34:57 +09:00
Mingliang Liu
beb70fed4f
HADOOP-13655. document object store use with fs shell and distcp. Contributed by Steve Loughran
...
This closes #131
2016-11-22 13:12:23 -08:00
Mingliang Liu
5af572b644
HADOOP-13427. Eliminate needless uses of FileSystem#{exists(), isFile(), isDirectory()}. Contributed by Steve Loughran and Mingliang Liu
2016-11-15 10:57:00 -08:00
Masatake Iwasaki
0bdd263d82
HADOOP-13017. Implementations of InputStream.read(buffer, offset, bytes) to exit 0 if bytes==0. Contributed by Steve Loughran.
2016-10-27 15:46:59 +09:00
Yongjun Zhang
0f0c15f7a5
HDFS-11040. Add documentation for HDFS-9820 distcp improvement. Contributed by Yongjun Zhang.
2016-10-25 12:25:40 -07:00
Yongjun Zhang
3a60573039
Revert "Fix HDFS-11040"
...
This reverts commit 54c1815790
.
2016-10-25 12:25:02 -07:00
Yongjun Zhang
54c1815790
Fix HDFS-11040
2016-10-25 12:19:34 -07:00
Chris Douglas
a1a0281e12
HADOOP-13626. Remove distcp dependency on FileStatus serialization
2016-10-24 12:46:54 -07:00
Yongjun Zhang
8650cc84f2
HDFS-9820. Improve distcp to support efficient restore to an earlier snapshot. Contributed by Yongjun Zhang.
2016-10-19 17:37:54 -07:00
Xiao Chen
efdf810cf9
HADOOP-7352. FileSystem#listStatus should throw IOE upon access error. Contributed by John Zhuge.
2016-10-18 18:18:43 -07:00
Yongjun Zhang
0bc6d37f3c
Revert "HDFS-9820. Improve distcp to support efficient restore to an earlier snapshot. Contributed by Yongjun Zhang."
...
This reverts commit 412c4c9a34
.
2016-10-17 22:47:37 -07:00
Yongjun Zhang
412c4c9a34
HDFS-9820. Improve distcp to support efficient restore to an earlier snapshot. Contributed by Yongjun Zhang.
2016-10-17 11:04:42 -07:00
Jing Zhao
0a85d07983
HADOOP-13024. Distcp with -delete feature on raw data not implemented. Contributed by Mavin Martin.
2016-10-13 13:24:54 -07:00
Brahma Reddy Battula
e17a4970be
HDFS-9885. Correct the distcp counters name while displaying counters. Contributed by Surendra Singh Lilhore
2016-09-27 10:45:12 +05:30
Steve Loughran
e5ef51e717
HADOOP-13643. Math error in AbstractContractDistCpTest. Contributed by Aaron Fabbri.
2016-09-23 10:01:30 +01:00
Chris Nauroth
98bdb51397
HADOOP-13169. Randomize file list in SimpleCopyListing. Contributed by Rajesh Balamohan.
2016-09-19 15:16:47 -07:00
Allen Wittenauer
58ed4fa544
HADOOP-13341. Deprecate HADOOP_SERVERNAME_OPTS; replace with (command)_(subcommand)_OPTS
...
This commit includes the following changes:
HADOOP-13356. Add a function to handle command_subcommand_OPTS
HADOOP-13355. Handle HADOOP_CLIENT_OPTS in a function
HADOOP-13554. Add an equivalent of hadoop_subcmd_opts for secure opts
HADOOP-13562. Change hadoop_subcommand_opts to use only uppercase
HADOOP-13358. Modify HDFS to use hadoop_subcommand_opts
HADOOP-13357. Modify common to use hadoop_subcommand_opts
HADOOP-13359. Modify YARN to use hadoop_subcommand_opts
HADOOP-13361. Modify hadoop_verify_user to be consistent with hadoop_subcommand_opts (ie more granularity)
HADOOP-13564. modify mapred to use hadoop_subcommand_opts
HADOOP-13563. hadoop_subcommand_opts should print name not actual content during debug
HADOOP-13360. Documentation for HADOOP_subcommand_OPTS
This closes apache/hadoop#126
2016-09-12 11:10:00 -07:00
Ravi Prakash
9faccd1046
HADOOP-13587. distcp.map.bandwidth.mb is overwritten even when -bandwidth flag isn't set. Contributed by Zoran Dimitrijevic
2016-09-12 08:26:08 -07:00
Andrew Wang
da456ffd62
Preparing for 3.0.0-alpha2 development
2016-07-15 19:04:17 -07:00
Andrew Wang
f292624bd8
HDFS-10300. TestDistCpSystem should share MiniDFSCluster. Contributed by John Zhuge.
2016-07-11 18:06:28 -07:00
Yongjun Zhang
8113855b3a
HDFS-10396. Using -diff option with DistCp may get "Comparison method violates its general contract" exception. Contributed by Yongjun Zhang.
2016-06-28 23:15:13 -07:00
Allen Wittenauer
422c73a865
HADOOP-13034. Log message about input options in distcp lacks some items (Takashi Ohnishi via aw)
2016-06-28 07:21:04 -07:00
Yongjun Zhang
cfb860dee7
HADOOP-13199. Add doc for distcp -filters. (John Zhuge via Yongjun Zhang)
2016-05-26 23:30:31 -07:00
Steve Loughran
c918286b17
HADOOP-13145 In DistCp, prevent unnecessary getFileStatus call when not preserving metadata. Contributed by Chris Nauroth.
2016-05-20 12:21:59 +01:00
Jing Zhao
03788d3015
HDFS-10397. Distcp should ignore -delete option if -diff option is provided instead of exiting. Contributed by Mingliang Liu.
2016-05-17 15:46:30 -07:00
Steve Loughran
c69a649257
HADOOP-13163 Reuse pre-computed filestatus in Distcp-CopyMapper (Rajesh Balamohan via stevel)
2016-05-17 13:00:18 +01:00
Allen Wittenauer
730bc746f9
HADOOP-12930. Dynamic subcommands for hadoop shell scripts (aw)
...
This commit contains the following JIRA issues:
HADOOP-12931. bin/hadoop work for dynamic subcommands
HADOOP-12932. bin/yarn work for dynamic subcommands
HADOOP-12933. bin/hdfs work for dynamic subcommands
HADOOP-12934. bin/mapred work for dynamic subcommands
HADOOP-12935. API documentation for dynamic subcommands
HADOOP-12936. modify hadoop-tools to take advantage of dynamic subcommands
HADOOP-13086. enable daemonization of dynamic commands
HADOOP-13087. env var doc update for dynamic commands
HADOOP-13088. fix shellprofiles in hadoop-tools to allow replacement
HADOOP-13089. hadoop distcp adds client opts twice when dynamic
HADOOP-13094. hadoop-common unit tests for dynamic commands
HADOOP-13095. hadoop-hdfs unit tests for dynamic commands
HADOOP-13107. clean up how rumen is executed
HADOOP-13108. dynamic subcommands need a way to manipulate arguments
HADOOP-13110. add a streaming subcommand to mapred
HADOOP-13111. convert hadoop gridmix to be dynamic
HADOOP-13115. dynamic subcommand docs should talk about exit vs. continue program flow
HADOOP-13117. clarify daemonization and security vars for dynamic commands
HADOOP-13120. add a --debug message when dynamic commands have been used
HADOOP-13121. rename sub-project shellprofiles to match the rest of Hadoop
HADOOP-13129. fix typo in dynamic subcommand docs
HADOOP-13151. Underscores should be escaped in dynamic subcommands document
HADOOP-13153. fix typo in debug statement for dynamic subcommands
2016-05-16 17:54:45 -07:00
Chris Nauroth
b9685e85d5
HADOOP-13148. TestDistCpViewFs to include IOExceptions in test error reports. Contributed by Steve Loughran.
2016-05-16 11:53:17 -07:00
Andrew Wang
3c5c57af28
HADOOP-13142. Change project version from 3.0.0 to 3.0.0-alpha1.
2016-05-12 18:27:28 -07:00
Andrew Wang
ca5613af91
Revert "Update project version to 3.0.0-alpha1-SNAPSHOT."
...
This reverts commit 6b53802cba
.
2016-05-12 15:32:45 -07:00
Andrew Wang
6b53802cba
Update project version to 3.0.0-alpha1-SNAPSHOT.
2016-05-12 11:05:05 -07:00
Jing Zhao
af942585a1
HADOOP-12469. distcp should not ignore the ignoreFailures option. Contributed by Mingliang Liu.
2016-05-04 10:23:04 -07:00
Yongjun Zhang
959a28dd12
HDFS-10313. Distcp need to enforce the order of snapshot names passed to -diff. (Lin Yiqun via Yongjun Zhang)
2016-04-26 16:08:03 -07:00
Akira Ajisaka
02c51c27d9
HDFS-10298. Document the usage of distcp -diff option. Contributed by Takashi Ohnishi.
2016-04-25 22:33:09 +09:00