Commit Graph

208 Commits

Author SHA1 Message Date
Ben Roling
a36274d699
HADOOP-16085. S3Guard: use object version or etags to protect against inconsistent read after replace/overwrite.
Contributed by Ben Roling.

S3Guard will now track the etag of uploaded files and, if an S3
bucket is versioned, the object version.

You can then control how to react to a mismatch between the data
in the DynamoDB table and that in the store: warn, fail, or, when
using versions, return the original value.

This adds two new columns to the table: etag and version.
This is transparent to older S3A clients -but when such clients
add/update data to the S3Guard table, they will not add these values.
As a result, the etag/version checks will not work with files uploaded by older clients.

For a consistent experience, upgrade all clients to use the latest hadoop version.
2019-05-19 22:29:54 +01:00
Ben Roling
0af4011580
HADOOP-16221. S3Guard: add option to fail operation on metadata write failure. 2019-04-30 11:53:26 +01:00
Ben Roling
e1c5ddf2aa
HADOOP-16252. Add prefix to dynamo tables in tests.
Contributed by Ben Roling.
2019-04-24 14:55:58 +01:00
Steve Loughran
cf4efcab3b
HADOOP-16118. S3Guard to support on-demand DDB tables.
This is the first step for on-demand operations: things recognize when they are using on-demand tables,
as do the tests.

Contributed by Steve Loughran.
2019-04-11 17:12:12 -07:00
Gabor Bota
b5db238383
HADOOP-15999. S3Guard: Better support for out-of-band operations.
Author:    Gabor Bota
2019-03-28 15:59:25 +00:00
Adam Antal
c0427c84dd
HADOOP-16124. Extend documentation in testing.md about S3 endpoint constants.
Contributed by Adam Antal.
2019-03-18 19:13:13 +00:00
Ben Roling
6fa229891e
HADOOP-15625. S3A input stream to use etags/version number to detect changed source files.
Author: Ben Roling <ben.roling@gmail.com>

Initial patch from Brahma Reddy Battula.
2019-03-13 20:37:11 +00:00
Adam Antal
1e0ae6ed15
HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found.
Contributed by Adam Antal.

(Revised patch applied after stevel committed the wrong one; that has been reverted)
2019-02-19 11:33:02 +00:00
Steve Loughran
920a89627d
Revert "HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found."
This reverts commit c4a00d1ad3.
2019-02-18 14:57:22 +00:00
Steve Loughran
f365957c63
HADOOP-15229. Add FileSystem builder-based openFile() API to match createFile();
S3A to implement S3 Select through this API.

The new openFile() API is asynchronous, and implemented across FileSystem and FileContext.

The MapReduce V2 inputs are moved to this API, and you can actually set must/may
options to pass in.

This is more useful for setting things like s3a seek policy than for S3 select,
as the existing input format/record readers can't handle S3 select output where
the stream is shorter than the file length, and splitting plain text is suboptimal.
Future work is needed there.

In the meantime, any/all filesystem connectors are now free to add their own filesystem-specific
configuration parameters which can be set in jobs and used to set filesystem input stream
options (seek policy, retry, encryption secrets, etc).

Contributed by Steve Loughran
2019-02-05 11:51:02 +00:00
Akira Ajisaka
3c60303ac5
HADOOP-16065. -Ddynamodb should be -Ddynamo in AWS SDK testing document. 2019-01-25 10:27:59 +09:00
Steve Loughran
6d0bffe17e
HADOOP-14556. S3A to support Delegation Tokens.
Contributed by Steve Loughran and Daryn Sharp.
2019-01-14 17:59:27 +00:00
Adam Antal
c4a00d1ad3
HADOOP-15843. s3guard bucket-info command to not print a stack trace on bucket-not-found.
Contributed by Adam Antal.
2019-01-14 17:27:00 +00:00
Sean Mackrory
3420e26ae5 HADOOP-16027. [DOC] Effective use of FS instances during S3A integration tests. Contributed by Gabor Bota. 2019-01-09 10:57:58 -07:00
Akira Ajisaka
7f78397036
Revert "HADOOP-14556. S3A to support Delegation Tokens."
This reverts commit d7152332b3.
2019-01-08 14:51:30 +09:00
Steve Loughran
d7152332b3
HADOOP-14556. S3A to support Delegation Tokens.
Contributed by Steve Loughran.
2019-01-07 13:18:03 +00:00
Sean Mackrory
c35de95a22 HADOOP-15987. ITestDynamoDBMetadataStore should check if table configured properly. Contributed by Gabor Bota. 2018-12-11 08:29:39 -07:00
Sean Mackrory
3ff8580f22 HADOOP-15428. s3guard bucket-info will create s3guard table if FS is set to do this automatically. (Contributed by Gabor Bota) 2018-12-10 14:03:08 -07:00
Akira Ajisaka
66b1335bb3
HADOOP-15926. Document upgrading the section in NOTICE.txt when upgrading the version of AWS SDK. Contributed by Dinesh Chitlangia. 2018-11-15 16:30:24 +09:00
Aaron Fabbri
046b8768af
HADOOP-15621 S3Guard: Implement time-based (TTL) expiry for Authoritative Directory Listing. Contributed by Gabor Bota 2018-10-02 21:22:49 -07:00
Steve Loughran
d7c0a08a1c
HADOOP-15426 Make S3guard client resilient to DDB throttle events and network failures (Contributed by Steve Loughran) 2018-09-12 21:04:49 -07:00
Aaron Fabbri
d32a8d5d58
HADOOP-14734 add option to tag DDB table(s) created. (Contributed by Gabor Bota and Abe Fine) 2018-09-12 16:36:01 -07:00
Mingliang Liu
87f63b6479 HADOOP-14833. Remove s3a user:secret authentication. Contributed by Steve Loughran 2018-09-11 17:18:42 -07:00
Steve Loughran
5a0babf765
HADOOP-15107. Stabilize/tune S3A committers; review correctness & docs.
Contributed by Steve Loughran.
2018-08-30 14:49:53 +01:00
Aaron Fabbri
d7232857d8
HADOOP-14154 Persist isAuthoritative bit in DynamoDBMetaStore (Contributed by Gabor Bota) 2018-08-17 10:15:39 -07:00
Steve Loughran
0e832e7a74
HADOOP-15642. Update aws-sdk version to 1.11.375.
Contributed by Steve Loughran.
2018-08-16 09:58:46 -07:00
Steve Loughran
da9a39eed1
HADOOP-15583. Stabilize S3A Assumed Role support.
Contributed by Steve Loughran.
2018-08-08 22:57:24 -07:00
Sean Mackrory
7862f1523f HADOOP-15400. Improve S3Guard documentation on Authoritative Mode implementation. (Contributed by Gabor Bota) 2018-08-07 20:13:09 -06:00
Yiqun Lin
1312f9ae4c HADOOP-15391. Add missing css file in hadoop-aws, hadoop-aliyun, hadoop-azure and hadoop-azure-datalake modules. 2018-04-18 16:04:00 +08:00
Aaron Fabbri
ea3849f0cc
HADOOP-14759 S3GuardTool prune to prune specific bucket entries. Contributed by Gabor Bota. 2018-04-05 20:23:17 -07:00
Sean Mackrory
7ce6b41509 HADOOP-15332. Fix typos in hadoop-aws markdown docs. Contributed by Gabor Bota. 2018-03-20 21:12:20 -07:00
Steve Loughran
dd05871b8b HADOOP-15297. Make S3A etag => checksum feature optional.
Contributed by Steve Loughran.
2018-03-12 14:01:42 +00:00
Steve Loughran
8110d6a0d5 HADOOP-13761. S3Guard: implement retries for DDB failures and throttling; translate exceptions.
Contributed by Aaron Fabbri.
2018-03-05 14:06:20 +00:00
Steve Loughran
7ac88244c5 HADOOP-14507. Extend per-bucket secret key config with explicit getPassword() on fs.s3a.$bucket.secret.key.
Contributed by Steve Loughran.
2018-02-16 16:37:06 +00:00
Steve Loughran
9a013b255f HADOOP-15176. Enhance IAM Assumed Role support in S3A client.
Contributed by Steve Loughran

(cherry picked from commit 96c047fbb98c2378eed9693a724d4cbbd03c00fd)
2018-02-15 15:57:10 +00:00
Steve Loughran
b27ab7dd81 HADOOP-15076. Enhance S3A troubleshooting documents and add a performance document.
Contributed by Steve Loughran.

(cherry picked from commit c761e658f6594c4e519ed39ef36669de2c5cee15)
2018-02-15 14:57:56 +00:00
Steve Loughran
1093a73689 HADOOP-13974. S3Guard CLI to support list/purge of pending multipart commits.
Contributed by Aaron Fabbri
2018-01-18 13:13:58 +00:00
Steve Loughran
f274fe33ea Revert "HADOOP-13974. S3Guard CLI to support list/purge of pending multipart commits."
This reverts commit 35ad9b1dd2.
2018-01-18 12:35:57 +00:00
Aaron Fabbri
268ab4e027
HADOOP-15141 Support IAM Assumed roles in S3A. Contributed by Steve Loughran. 2018-01-17 00:05:24 -08:00
Steve Loughran
1a09da7400 HADOOP-15163. Fix S3ACommitter documentation
Contributed by Alessandro Andrioni.

(cherry picked from commit 100e8a1ae1d930dde084af7d1281e491c7f124ec)
2018-01-10 15:37:07 +00:00
Steve Loughran
1ba491ff90 HADOOP-14965. S3a input stream "normal" fadvise mode to be adaptive 2017-12-20 18:25:33 +00:00
Steve Loughran
35ad9b1dd2 HADOOP-13974. S3Guard CLI to support list/purge of pending multipart commits.
Contributed by Aaron Fabbri
2017-12-18 21:19:06 +00:00
Aaron Fabbri
6555af81a2
HADOOP-14475 Metrics of S3A don't print out when enabled. Contributed by Younger and Sean Mackrory. 2017-12-05 11:06:32 -08:00
Steve Loughran
3150c019ae HADOOP-15071 S3a troubleshooting docs to add a couple more failure modes.
Contributed by Steve Loughran
2017-12-05 15:05:41 +00:00
Steve Loughran
de8b6ca5ef HADOOP-13786 Add S3A committer for zero-rename commits to S3 endpoints.
Contributed by Steve Loughran and Ryan Blue.
2017-11-22 15:28:12 +00:00
Aaron Fabbri
47011d7dd3
HADOOP-14220 Enhance S3GuardTool with bucket-info and set-capacity commands, tests. Contributed by Steve Loughran 2017-09-25 15:59:38 -07:00
Aaron Fabbri
49467165a5
HADOOP-14738 Remove S3N and obsolete bits of S3A; rework docs. Contributed by Steve Loughran. 2017-09-14 14:10:48 -07:00
Steve Loughran
5bbca80428
HADOOP-13421. Switch to v2 of the S3 List Objects API in S3A.
Contributed by Aaron Fabbri
2017-09-08 12:07:02 +01:00
John Zhuge
50506e90a8 HADOOP-14103. Sort out hadoop-aws contract-test-options.xml. Contributed by John Zhuge. 2017-09-05 23:26:57 -07:00
Steve Loughran
621b43e254
HADOOP-13345 HS3Guard: Improved Consistency for S3A.
Contributed by: Chris Nauroth, Aaron Fabbri, Mingliang Liu, Lei (Eddy) Xu,
Sean Mackrory, Steve Loughran and others.
2017-09-01 14:13:41 +01:00
Steve Loughran
ee243e5289
HADOOP-14190. Add more on S3 regions to the s3a documentation.
Contributed by Steve Loughran
2017-06-28 10:22:13 +01:00
John Zhuge
6c6a7a5962 HADOOP-14464. hadoop-aws doc header warning #5 line wrapped. Contributed by John Zhuge. 2017-05-28 22:25:00 -07:00
Steve Loughran
ba70225cf6
HADOOP-11572. s3a delete() operation fails during a concurrent delete of child entries.
Contributed by Steve Loughran.

(cherry picked from commit 2ac5aab8d725f761a9f9723471a4426f6b5d78c4)
2017-05-18 15:44:39 +01:00
Steve Loughran
5f934f8386
HADOOP-14305 S3A SSE tests won't run in parallel: Bad request in directory GetFileStatus.
Contributed by Steve Moist.
2017-04-24 20:33:19 +01:00
Mingliang Liu
667966c13c HADOOP-14324. Refine S3 server-side-encryption key as encryption secret; improve error reporting and diagnostics. Contributed by Steve Loughran 2017-04-20 17:13:36 -07:00
Chris Nauroth
b8305e6d06 HADOOP-14248. Retire SharedInstanceProfileCredentialsProvider in trunk. Contributed by Mingliang Liu. 2017-04-12 10:02:13 -07:00
Mingliang Liu
5faa949b78 HADOOP-14268. Fix markdown itemization in hadoop-aws documents. Contributed by Akira Ajisaka 2017-04-03 11:07:14 -07:00
Akira Ajisaka
0d053eeb30
HADOOP-14256. [S3A DOC] Correct the format for "Seoul" example. Contributed by Brahma Reddy Battula. 2017-03-30 18:11:50 +09:00
Steve Loughran
4f4250fbcc HADOOP-14099 Split S3 testing documentation out into its own file. Contributed by Steve Loughran. 2017-02-22 11:43:48 +00:00
Steve Loughran
3a2e30fa9f HADOOP-14092. Typo in hadoop-aws index.md. Contributed by John Zhuge
(cherry picked from commit b1c1f05b1dc997906390d653dfafb4f0d7e193c4)
2017-02-18 18:17:11 +00:00
Mingliang Liu
bdad8b7b97 HADOOP-14019. Fix some typos in the s3a docs. Contributed by Steve Loughran 2017-02-16 16:41:31 -08:00
Lei Xu
839b690ed5 HADOOP-13075. Add support for SSE-KMS and SSE-C in s3a filesystem. (Steve Moist via lei) 2017-02-11 13:59:03 -08:00
Steve Loughran
e648b6e138 HADOOP-13336 S3A to support per-bucket configuration. Contributed by Steve Loughran 2017-01-11 17:25:15 +00:00
Mingliang Liu
c6a3923245 HADOOP-13871. ITestS3AInputStreamPerformance.testTimeToOpenAndReadWholeFileBlocks performance awful. Contributed by Steve Loughran 2016-12-12 14:55:34 -08:00
Steve Loughran
a1761a841e HADOOP-13680. fs.s3a.readahead.range to use getLongBytes. Contributed by Abhishek Modi. 2016-10-31 20:54:46 +00:00
Chris Nauroth
309a43925c HADOOP-13309. Document S3A known limitations in file ownership and permission model. Contributed by Chris Nauroth. 2016-10-25 09:03:03 -07:00
Chris Nauroth
d8fa1cfa67 HADOOP-13727. S3A: Reduce high number of connections to EC2 Instance Metadata Service caused by InstanceProfileCredentialsProvider. Contributed by Chris Nauroth. 2016-10-24 21:22:34 -07:00
Steve Loughran
6c348c5691 HADOOP-13560. S3ABlockOutputStream to support huge (many GB) file writes. Contributed by Steve Loughran 2016-10-18 21:16:02 +01:00
Chris Nauroth
88b9444a81 HADOOP-13674. S3A can provide a more detailed error message when accessing a bucket through an incorrect S3 endpoint. Contributed by Chris Nauroth. 2016-10-04 10:36:58 -07:00
Mingliang Liu
96142efa2d HADOOP-13621. s3:// should have been fully cut off from trunk. Contributed by Mingliang Liu. 2016-09-17 22:07:46 -07:00
Steve Loughran
4b6d795f28 HADOOP-13540 improve section on troubleshooting s3a auth problems. Contributed by Steve Loughran 2016-09-09 18:55:32 +01:00
Chris Nauroth
6f9c346e57 HADOOP-13446. Support running isolated unit tests separate from AWS integration tests. Contributed by Chris Nauroth. 2016-08-23 07:18:49 -07:00
Chris Nauroth
763f0497bb HADOOP-13252. Tune S3A provider plugin mechanism. Contributed by Steve Loughran. 2016-08-19 10:48:10 -07:00
Steve Loughran
040c185d62 HADOOP-13405 doc for fs.s3a.acl.default indicates incorrect values. Contributed by Shen Yinjie 2016-08-18 14:36:55 +01:00
Chris Nauroth
3808876c73 HADOOP-13324. s3a tests don't authenticate with S3 frankfurt (or other V4 auth only endpoints). Contributed by Steve Loughran. 2016-08-16 17:05:52 -07:00
Steve Loughran
37362c2f92 HADOOP-13212 Provide an option to set the socket buffers in S3AFileSystem (Rajesh Balamohan) 2016-07-20 13:42:51 +01:00
Steve Loughran
96fa0f848b HADOOP-12709 Cut s3:// from trunk. Contributed by Mingliang Liu. 2016-06-29 16:04:50 +01:00
Steve Loughran
4ee3543625 HADOOP-13203 S3A: Support fadvise "random" mode for high performance readPositioned() reads. Contributed by Rajesh Balamohan and stevel. 2016-06-22 15:45:25 +01:00
Chris Nauroth
127d2c7281 HADOOP-13241. document s3a better. Contributed by Steve Loughran. 2016-06-16 11:18:02 -07:00
Ravi Prakash
4aefe119a0 HADOOP-3733. "s3x:" URLs break when Secret Key contains a slash, even if encoded. Contributed by Steve Loughran. 2016-06-16 11:13:35 -07:00
Steve Loughran
31ffaf76f2 HADOOP-12537 S3A to support Amazon STS temporary credentials. Contributed by Sean Mackrory. 2016-06-09 21:00:47 +01:00
Steve Loughran
656c460c0e HADOOP-13237: s3a initialization against public bucket fails if caller lacks any credentials. Contributed by Chris Nauroth 2016-06-09 17:28:49 +01:00
Steve Loughran
a3f78d8fa8 HADOOP-12807 S3AFileSystem should read AWS credentials from environment variables. Contributed by Tobin Baker. 2016-06-06 23:42:36 +02:00
Chris Nauroth
c58a59f708 HADOOP-13171. Add StorageStatistics to S3A; instrument some more operations. Contributed by Steve Loughran. 2016-06-03 08:55:33 -07:00
Chris Nauroth
16b1cc7af9 HADOOP-13131. Add tests to verify that S3A supports SSE-S3 encryption. Contributed by Steve Loughran. 2016-06-01 14:49:22 -07:00
Steve Loughran
757050ff35 HADOOP-12723 S3A: Add ability to plug in any AWSCredentialsProvider. Contributed by Steven Wong. 2016-05-20 13:52:15 +01:00
Steve Loughran
c918286b17 HADOOP-13145 In DistCp, prevent unnecessary getFileStatus call when not preserving metadata. Contributed by Chris Nauroth. 2016-05-20 12:21:59 +01:00
Steve Loughran
dd3a8bed0a HADOOP-13113 Enable parallel test execution for hadoop-aws. Chris Nauroth via stevel 2016-05-13 10:47:12 +01:00
Steve Loughran
27c4e90efc HADOOP-13028 add low level counter metrics for S3A; use in read performance tests. contributed by: stevel
patch includes
HADOOP-12844 Recover when S3A fails on IOException in read()
HADOOP-13058 S3A FS fails during init against a read-only FS if multipart purge
HADOOP-13047 S3a Forward seek in stream length to be configurable
2016-05-12 19:24:20 +01:00
Steve Loughran
def2a6d385 HADOOP-13122 Customize User-Agent header sent in HTTP requests by S3A. Chris Nauroth via stevel. 2016-05-12 13:57:35 +01:00
Steve Loughran
025219b12f HADOOP-12982 Document missing S3A and S3 properties. (Wei-Chiu Chuang via stevel) 2016-05-10 21:37:22 +01:00
Steve Loughran
19f0f9608e HADOOP-12891. S3AFileSystem should configure Multipart Copy threshold and chunk size. (Andrew Olson via stevel) 2016-04-22 11:25:03 +01:00
Steve Loughran
df18b6e984 HADOOP-12963 Allow using path style addressing for accessing the s3 endpoint. (Stephen Montgomery via stevel) 2016-04-14 12:44:55 +01:00
Harsh J
256c82fe29 HADOOP-11687. Ignore x-* and response headers when copying an Amazon S3 object. Contributed by Aaron Peterson and harsh. 2016-04-01 14:18:10 +05:30
Allen Wittenauer
738155063e HADOOP-12857. rework hadoop-tools (aw) 2016-03-23 13:46:38 -07:00
cnauroth
8ab7658025 HADOOP-11031. Design Document for Credential Provider API. Contributed by Larry McCay. 2016-02-18 14:06:38 -08:00
Steve Loughran
29ae258013 HADOOP-12292. Make use of DeleteObjects optional. (Thomas Demoor via stevel) 2016-02-06 15:05:16 +00:00
Lei Xu
126705f67e HADOOP-11262. Enable YARN to use S3A. (Pieter Reuse via lei) 2016-01-12 12:19:53 -08:00
Lei Xu
bff7c90a56 HADOOP-11684. S3a to use thread pool that blocks clients. (Thomas Demoor and Aaron Fabbri via lei) 2015-11-05 18:35:15 -08:00
Lei Xu
6ab2d19f5c HADOOP-12346. Increase some default timeouts / retries for S3a connector. (Sean Mackrory via Lei (Eddy) Xu) 2015-08-29 09:59:30 -07:00
Lei Xu
d5403747b5 HADOOP-12269. Update aws-sdk dependency to 1.10.6 (Thomas Demoor via Lei (Eddy) Xu) 2015-08-04 18:51:52 -07:00
Steve Loughran
64443490d7 HADOOP-11670. Regression: s3a auth setup broken. (Adam Budde via stevel) 2015-03-08 11:22:16 -07:00
Steve Loughran
15b7076ad5 HADOOP-11183. Memory-based S3AOutputstream. (Thomas Demoor via stevel) 2015-03-03 16:18:51 -08:00
Akira Ajisaka
1a625b8158 HADOOP-11480. Typo in hadoop-aws/index.md uses wrong scheme for test.fs.s3.name. Contributed by Ted Yu. 2015-02-24 17:11:46 -08:00
Steve Loughran
00b80958d8 HADOOP-11521. Make connection timeout configurable in s3a. (Thomas Demoor via stevel) 2015-02-17 20:06:27 +00:00
Steve Loughran
78a7e8d3a6 HADOOP-11522. Update S3A Documentation. (Thomas Demoor via stevel) 2015-02-17 18:15:02 +00:00
Harsh J
ffc75d6ebe HADOOP-11488. Difference in default connection timeout for S3A FS. Contributed by Daisuke Kobayashi. 2015-02-01 00:17:04 +05:30
cnauroth
9458cd5bce HADOOP-11394. hadoop-aws documentation missing. Contributed by Chris Nauroth. 2014-12-12 23:29:11 -08:00