hadoop

Author	SHA1	Message	Date
Steve Loughran	9221704f85	HADOOP-16490. Avoid/handle cached 404s during S3A file creation. Contributed by Steve Loughran. This patch avoids issuing any HEAD path request when creating a file with overwrite=true, so 404s will not end up in the S3 load balancers unless someone calls getFileStatus/exists/isFile in their own code. The Hadoop FsShell CommandWithDestination class is modified to not register uncreated files for deleteOnExit(), because that calls exists() and so can place the 404 in the cache, even after S3A is patched to not do it itself. Because S3Guard knows when a file should be present, it adds a special FileNotFound retry policy independently configurable from other retry policies; it is also exponential, but with different parameters. This is because every HEAD request will refresh any 404 cached in the S3 Load Balancers. It's not enough to retry: we have to have a suitable gap between attempts to (hopefully) ensure any cached entry wil be gone. The options and values are: fs.s3a.s3guard.consistency.retry.interval: 2s fs.s3a.s3guard.consistency.retry.limit: 7 The S3A copy() method used during rename() raises a RemoteFileChangedException which is not caught so not downgraded to false. Thus: when a rename is unrecoverable, this fact is propagated. Copy operations without S3Guard lack the confidence that the file exists, so don't retry the same way: it will fail fast with a different error message. However, because create(path, overwrite=false) no longer does HEAD path, we can at least be confident that S3A itself is not creating those cached 404 markers. Change-Id: Ia7807faad8b9a8546836cb19f816cccf17cca26d	2019-09-11 16:46:25 +01:00
Steve Loughran	61b2df2331	HADOOP-16470. Make last AWS credential provider in default auth chain EC2ContainerCredentialsProviderWrapper. Contributed by Steve Loughran. Contains HADOOP-16471. Restore (documented) fs.s3a.SharedInstanceProfileCredentialsProvider. Change-Id: I06b99b57459cac80bf743c5c54f04e59bb54c2f8	2019-08-22 17:27:56 +01:00
Steve Loughran	e25a5c2eab	HADOOP-16499. S3A retry policy to be exponential (#1246 ). Contributed by Steve Loughran.	2019-08-09 15:52:37 +02:00
Felipe Lopes	bca86bd289	HADOOP-16469. Update committers.md Contributed by Felipe Lopes. Change-Id: I5c05b878bde073aeb45bf22340183893f85269e1	2019-07-30 12:47:55 +01:00
Sean Mackrory	7f1b76ca35	HADOOP-13868. [s3a] New default for S3A multi-part configuration (#1125 )	2019-07-19 09:49:59 -06:00
lqjaclee	cd967c75a7	HADOOP-15847. S3Guard testConcurrentTableCreations to set R/W capacity == 0 Contributed by lqjaclee Change-Id: I4a4d5b29f2677c188799479e4db38f07fa0591d1	2019-07-19 14:46:55 +01:00
Josh Rosen	d545f9c290	HADOOP-16437 documentation typo fix: fs.s3a.experimental.input.fadvise Fix fs.s3a.experimental.fadvise to fs.s3a.experimental.input.fadvise Contributed by: Josh Rosen	2019-07-18 23:19:38 +01:00
Steve Loughran	b15ef7dc3d	HADOOP-16384: S3A: Avoid inconsistencies between DDB and S3. Contributed by Steve Loughran Contains - HADOOP-16397. Hadoop S3Guard Prune command to support a -tombstone option. - HADOOP-16406. ITestDynamoDBMetadataStore.testProvisionTable times out intermittently This patch doesn't fix the underlying problem but it * changes some tests to clean up better * does a lot more in logging operations in against DDB, if enabled * adds an entry point to dump the state of the metastore and s3 tables (precursor to fsck) * adds a purge entry point to help clean up after a test run has got a store into a mess * s3guard prune command adds -tombstone option to only clear tombstones The outcome is that tests should pass consistently and if problems occur we have better diagnostics. Change-Id: I3eca3f5529d7f6fec398c0ff0472919f08f054eb	2019-07-12 13:02:25 +01:00
Steve Loughran	6a3433bffd	HADOOP-16357. TeraSort Job failing on S3 DirectoryStagingCommitter: destination path exists. Contributed by Steve Loughran. This patch * changes the default for the staging committer to append, as we get for the classic FileOutputFormat committer * adds a check for the dest path being a file not a dir * adds tests for this * Changes AbstractCommitTerasortIT. to not use the simple parser, so fails if the file is present. Change-Id: Id53742958ed1cf321ff96c9063505d64f3254f53	2019-07-11 18:15:34 +01:00
Sean Mackrory	34747c373f	HADOOP-16396. Allow authoritative mode on a subdirectory. (#1043 )	2019-07-03 12:04:47 -06:00
Steve Loughran	e02eb24e0a	HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename. Contributed by Steve Loughran. Change-Id: I825b0bc36be960475d2d259b1cdab45ae1bb78eb	2019-06-20 09:56:40 +01:00
Gabor Bota	f9cc9e1621	HADOOP-16279. S3Guard: Implement time-based (TTL) expiry for entries (and tombstones). Contributed by Gabor Bota. Change-Id: I73a2d2861901dedfe7a0e783b310fbb95e7c1af9	2019-06-16 17:05:01 +01:00
Steve Loughran	4e38dafde4	HADOOP-15563. S3Guard to support creating on-demand DDB tables. Contributed by Steve Loughran Change-Id: I2262b5b9f52e42ded8ed6f50fd39756f96e77087	2019-06-07 18:26:10 +01:00
Steve Loughran	ec26c431f9	HADOOP-16117. Update AWS SDK to 1.11.563. Contributed by Steve Loughran. Change-Id: I7c46ed2a6378e1370f567acf4cdcfeb93e43fa13	2019-06-06 10:08:18 +01:00
Ben Roling	a36274d699	HADOOP-16085. S3Guard: use object version or etags to protect against inconsistent read after replace/overwrite. Contributed by Ben Roling. S3Guard will now track the etag of uploaded files and, if an S3 bucket is versioned, the object version. You can then control how to react to a mismatch between the data in the DynamoDB table and that in the store: warn, fail, or, when using versions, return the original value. This adds two new columns to the table: etag and version. This is transparent to older S3A clients -but when such clients add/update data to the S3Guard table, they will not add these values. As a result, the etag/version checks will not work with files uploaded by older clients. For a consistent experience, upgrade all clients to use the latest hadoop version.	2019-05-19 22:29:54 +01:00
Ben Roling	0af4011580	HADOOP-16221. S3Guard: add option to fail operation on metadata write failure.	2019-04-30 11:53:26 +01:00
Ben Roling	e1c5ddf2aa	HADOOP-16252. Add prefix to dynamo tables in tests. Contributed by Ben Roling.	2019-04-24 14:55:58 +01:00
Steve Loughran	cf4efcab3b	HADOOP-16118. S3Guard to support on-demand DDB tables. This is the first step for on-demand operations: things recognize when they are using on-demand tables, as do the tests. Contributed by Steve Loughran.	2019-04-11 17:12:12 -07:00
Gabor Bota	b5db238383	HADOOP-15999. S3Guard: Better support for out-of-band operations. Author: Gabor Bota	2019-03-28 15:59:25 +00:00
Adam Antal	c0427c84dd	HADOOP-16124. Extend documentation in testing.md about S3 endpoint constants. Contributed by Adam Antal.	2019-03-18 19:13:13 +00:00
Ben Roling	6fa229891e	HADOOP-15625. S3A input stream to use etags/version number to detect changed source files. Author: Ben Roling <ben.roling@gmail.com> Initial patch from Brahma Reddy Battula.	2019-03-13 20:37:11 +00:00
Adam Antal	1e0ae6ed15	HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found. Contributed by Adam Antal. (Revised patch applied after stevel committed the wrong one; that has been reverted)	2019-02-19 11:33:02 +00:00
Steve Loughran	920a89627d	Revert "HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found." This reverts commit `c4a00d1ad3`.	2019-02-18 14:57:22 +00:00
Steve Loughran	f365957c63	HADOOP-15229. Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API. The new openFile() API is asynchronous, and implemented across FileSystem and FileContext. The MapReduce V2 inputs are moved to this API, and you can actually set must/may options to pass in. This is more useful for setting things like s3a seek policy than for S3 select, as the existing input format/record readers can't handle S3 select output where the stream is shorter than the file length, and splitting plain text is suboptimal. Future work is needed there. In the meantime, any/all filesystem connectors are now free to add their own filesystem-specific configuration parameters which can be set in jobs and used to set filesystem input stream options (seek policy, retry, encryption secrets, etc). Contributed by Steve Loughran	2019-02-05 11:51:02 +00:00
Akira Ajisaka	3c60303ac5	HADOOP-16065. -Ddynamodb should be -Ddynamo in AWS SDK testing document.	2019-01-25 10:27:59 +09:00
Steve Loughran	6d0bffe17e	HADOOP-14556. S3A to support Delegation Tokens. Contributed by Steve Loughran and Daryn Sharp.	2019-01-14 17:59:27 +00:00
Adam Antal	c4a00d1ad3	HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found. Contributed by Adam Antal.	2019-01-14 17:27:00 +00:00
Sean Mackrory	3420e26ae5	HADOOP-16027. [DOC] Effective use of FS instances during S3A integration tests. Contributed by Gabor Bota.	2019-01-09 10:57:58 -07:00
Akira Ajisaka	7f78397036	Revert "HADOOP-14556. S3A to support Delegation Tokens." This reverts commit `d7152332b3`.	2019-01-08 14:51:30 +09:00
Steve Loughran	d7152332b3	HADOOP-14556. S3A to support Delegation Tokens. Contributed by Steve Loughran.	2019-01-07 13:18:03 +00:00
Sean Mackrory	c35de95a22	HADOOP-15987. ITestDynamoDBMetadataStore should check if table configured properly. Contributed by Gabor Bota.	2018-12-11 08:29:39 -07:00
Sean Mackrory	3ff8580f22	HADOOP-15428. s3guard bucket-info will create s3guard table if FS is set to do this automatically. (Contributed by Gabor Bota)	2018-12-10 14:03:08 -07:00
Akira Ajisaka	66b1335bb3	HADOOP-15926. Document upgrading the section in NOTICE.txt when upgrading the version of AWS SDK. Contributed by Dinesh Chitlangia.	2018-11-15 16:30:24 +09:00
Aaron Fabbri	046b8768af	HADOOP-15621 S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing. Contributed by Gabor Bota	2018-10-02 21:22:49 -07:00
Steve Loughran	d7c0a08a1c	HADOOP-15426 Make S3guard client resilient to DDB throttle events and network failures (Contributed by Steve Loughran)	2018-09-12 21:04:49 -07:00
Aaron Fabbri	d32a8d5d58	HADOOP-14734 add option to tag DDB table(s) created. (Contributed by Gabor Bota and Abe Fine)	2018-09-12 16:36:01 -07:00
Mingliang Liu	87f63b6479	HADOOP-14833. Remove s3a user:secret authentication. Contributed by Steve Loughran	2018-09-11 17:18:42 -07:00
Steve Loughran	5a0babf765	HADOOP-15107. Stabilize/tune S3A committers; review correctness & docs. Contributed by Steve Loughran.	2018-08-30 14:49:53 +01:00
Aaron Fabbri	d7232857d8	HADOOP-14154 Persist isAuthoritative bit in DynamoDBMetaStore (Contributed by Gabor Bota)	2018-08-17 10:15:39 -07:00
Steve Loughran	0e832e7a74	HADOOP-15642. Update aws-sdk version to 1.11.375. Contributed by Steve Loughran.	2018-08-16 09:58:46 -07:00
Steve Loughran	da9a39eed1	HADOOP-15583. Stabilize S3A Assumed Role support. Contributed by Steve Loughran.	2018-08-08 22:57:24 -07:00
Sean Mackrory	7862f1523f	HADOOP-15400. Improve S3Guard documentation on Authoritative Mode implementation. (Contributed by Gabor Bota)	2018-08-07 20:13:09 -06:00
Yiqun Lin	1312f9ae4c	HADOOP-15391. Add missing css file in hadoop-aws, hadoop-aliyun, hadoop-azure and hadoop-azure-datalake modules.	2018-04-18 16:04:00 +08:00
Aaron Fabbri	ea3849f0cc	HADOOP-14759 S3GuardTool prune to prune specific bucket entries. Contributed by Gabor Bota.	2018-04-05 20:23:17 -07:00
Sean Mackrory	7ce6b41509	HADOOP-15332. Fix typos in hadoop-aws markdown docs. Contributed by Gabor Bota.	2018-03-20 21:12:20 -07:00
Steve Loughran	dd05871b8b	HADOOP-15297. Make S3A etag => checksum feature optional. Contributed by Steve Loughran.	2018-03-12 14:01:42 +00:00
Steve Loughran	8110d6a0d5	HADOOP-13761. S3Guard: implement retries for DDB failures and throttling; translate exceptions. Contributed by Aaron Fabbri.	2018-03-05 14:06:20 +00:00
Steve Loughran	7ac88244c5	HADOOP-14507. Extend per-bucket secret key config with explicit getPassword() on fs.s3a.$bucket.secret.key. Contributed by Steve Loughran.	2018-02-16 16:37:06 +00:00
Steve Loughran	9a013b255f	HADOOP-15176. Enhance IAM Assumed Role support in S3A client. Contributed by Steve Loughran (cherry picked from commit 96c047fbb98c2378eed9693a724d4cbbd03c00fd)	2018-02-15 15:57:10 +00:00
Steve Loughran	b27ab7dd81	HADOOP-15076. Enhance S3A troubleshooting documents and add a performance document. Contributed by Steve Loughran. (cherry picked from commit c761e658f6594c4e519ed39ef36669de2c5cee15)	2018-02-15 14:57:56 +00:00

1 2 3

122 Commits