Commit Graph

62 Commits

Author SHA1 Message Date
Steve Loughran
240b25310e
HADOOP-17271. S3A connector to support IOStatistics. (#2580)
S3A connector to support the IOStatistics API of HADOOP-16830,

This is a major rework of the S3A Statistics collection to

* Embrace the IOStatistics APIs
* Move from direct references of S3AInstrumention statistics
  collectors to interface/implementation classes in new packages.
* Ubiquitous support of IOStatistics, including:
  S3AFileSystem, input and output streams, RemoteIterator instances
  provided in list calls.
* Adoption of new statistic names from hadoop-common

Regarding statistic collection, as well as all existing
statistics, the connector now records min/max/mean durations
of HTTP GET and HEAD requests, and those of LIST operations.

Contributed by Steve Loughran.

Change-Id: I182d34b6ac39e017a8b4a221dad8e930882b39cf
2021-01-14 13:21:01 +00:00
Steve Loughran
49f8ae965e
HADOOP-13230. S3A to optionally retain directory markers.
This adds an option to disable "empty directory" marker deletion,
so avoid throttling and other scale problems.

This feature is *not* backwards compatible.
Consult the documentation and use with care.

Contributed by Steve Loughran.

Change-Id: I69a61e7584dc36e485d5e39ff25b1e3e559a1958
2020-08-15 20:19:49 +01:00
Akira Ajisaka
dfa7f160a5
Preparing for 3.3.1 development 2020-04-30 13:33:42 +09:00
Sahil Takiar
f206b736f0
HADOOP-16346. Stabilize S3A OpenSSL support.
Introduces `openssl` as an option for `fs.s3a.ssl.channel.mode`.
The new option is documented and marked as experimental.

For details on how to use this, consult the peformance document
in the s3a documentation.

This patch is the successor to HADOOP-16050 "S3A SSL connections
should use OpenSSL" -which was reverted because of
incompatibilities between the wildfly OpenSSL client and the AWS
HTTPS servers (HADOOP-16347). With the Wildfly release moved up
to 1.0.7.Final (HADOOP-16405) everything should now work.

Related issues:

* HADOOP-15669. ABFS: Improve HTTPS Performance
* HADOOP-16050: S3A SSL connections should use OpenSSL
* HADOOP-16371: Option to disable GCM for SSL connections when running on Java 8
* HADOOP-16405: Upgrade Wildfly Openssl version to 1.0.7.Final

Contributed by Sahil Takiar

Change-Id: I80a4bc5051519f186b7383b2c1cea140be42444e
2020-01-21 16:37:51 +00:00
Steve Loughran
f44abc3e11
HADOOP-16207 Improved S3A MR tests.
Contributed by Steve Loughran.

Replaces the committer-specific terasort and MR test jobs with parameterization
of the (now single tests) and use of file:// over hdfs:// as the cluster FS.

The parameterization ensures that only one of the specific committer tests
run at a time -overloads of the test machines are less likely, and so the
suites can be pulled back into the parallel phase.

There's also more detailed validation of the stage outputs of the terasorting;
if one test fails the rest are all skipped. This and the fact that job
output is stored under target/yarn-${timestamp} means failures should
be more debuggable.

Change-Id: Iefa370ba73c6419496e6e69dd6673d00f37ff095
2019-10-04 14:12:31 +01:00
Steve Loughran
b15ef7dc3d
HADOOP-16384: S3A: Avoid inconsistencies between DDB and S3.
Contributed by Steve Loughran

Contains

- HADOOP-16397. Hadoop S3Guard Prune command to support a -tombstone option.
- HADOOP-16406. ITestDynamoDBMetadataStore.testProvisionTable times out intermittently

This patch doesn't fix the underlying problem but it

* changes some tests to clean up better
* does a lot more in logging operations in against DDB, if enabled
* adds an entry point to dump the state of the metastore and s3 tables (precursor to fsck)
* adds a purge entry point to help clean up after a test run has got a store into a mess
* s3guard prune command adds -tombstone option to only clear tombstones

The outcome is that tests should pass consistently and if problems occur we have better diagnostics.

Change-Id: I3eca3f5529d7f6fec398c0ff0472919f08f054eb
2019-07-12 13:02:25 +01:00
Steve Loughran
e02eb24e0a
HADOOP-15183. S3Guard store becomes inconsistent after partial failure of rename.
Contributed by Steve Loughran.

Change-Id: I825b0bc36be960475d2d259b1cdab45ae1bb78eb
2019-06-20 09:56:40 +01:00
Steve Loughran
309501c6fa
Revert "HADOOP-16050: s3a SSL connections should use OpenSSL"
This reverts commit b067f8acaa.

Change-Id: I584b050a56c0e6f70b11fa3f7db00d5ac46e7dd8
2019-06-05 13:54:55 +01:00
Akira Ajisaka
afd844059c HADOOP-16331. Fix ASF License check in pom.xml
Signed-off-by: Takanobu Asanuma <tasanuma@apache.org>
2019-05-29 17:25:13 +09:00
Steve Loughran
0c73dba3a6
HADOOP-16332. Remove S3A dependency on http core.
Contributed by Steve Loughran.

Change-Id: I53209c993a405fefdb5e1b692d5a56d027d3b845
2019-05-28 22:50:37 +01:00
Akira Ajisaka
9f933e6446
HADOOP-16323. https everywhere in Maven settings. 2019-05-27 15:24:59 +09:00
Ben Roling
a36274d699
HADOOP-16085. S3Guard: use object version or etags to protect against inconsistent read after replace/overwrite.
Contributed by Ben Roling.

S3Guard will now track the etag of uploaded files and, if an S3
bucket is versioned, the object version.

You can then control how to react to a mismatch between the data
in the DynamoDB table and that in the store: warn, fail, or, when
using versions, return the original value.

This adds two new columns to the table: etag and version.
This is transparent to older S3A clients -but when such clients
add/update data to the S3Guard table, they will not add these values.
As a result, the etag/version checks will not work with files uploaded by older clients.

For a consistent experience, upgrade all clients to use the latest hadoop version.
2019-05-19 22:29:54 +01:00
Sahil Takiar
b067f8acaa HADOOP-16050: s3a SSL connections should use OpenSSL
(cherry picked from commit aebf229c175dfa19fff3b31e9e67596f6c6124fa)
2019-05-16 08:57:54 -06:00
Steve Loughran
9f1c017f44
HADOOP-16058. S3A tests to include Terasort.
Contributed by Steve Loughran.

This includes
 - HADOOP-15890. Some S3A committer tests don't match ITest* pattern; don't run in maven
 - MAPREDUCE-7090. BigMapOutput example doesn't work with paths off cluster fs
 - MAPREDUCE-7091. Terasort on S3A to switch to new committers
 - MAPREDUCE-7092. MR examples to work better against cloud stores
2019-03-21 11:15:37 +00:00
Akira Ajisaka
1129288cf5
HADOOP-14178. Move Mockito up to version 2.23.4. Contributed by Akira Ajisaka and Masatake Iwasaki. 2019-01-29 18:29:56 -08:00
Steve Loughran
6d0bffe17e
HADOOP-14556. S3A to support Delegation Tokens.
Contributed by Steve Loughran and Daryn Sharp.
2019-01-14 17:59:27 +00:00
Akira Ajisaka
7f78397036
Revert "HADOOP-14556. S3A to support Delegation Tokens."
This reverts commit d7152332b3.
2019-01-08 14:51:30 +09:00
Steve Loughran
d7152332b3
HADOOP-14556. S3A to support Delegation Tokens.
Contributed by Steve Loughran.
2019-01-07 13:18:03 +00:00
Steve Loughran
a668f8e6c6
HADOOP-16015. Add bouncycastle jars to hadoop-aws as test dependencies.
Contributed by Steve Loughran.
2018-12-20 18:09:01 +00:00
Sunil G
58fa96b697 Changed version in trunk to 3.3.0-SNAPSHOT. 2018-10-02 22:41:41 +05:30
Steve Loughran
d7c0a08a1c
HADOOP-15426 Make S3guard client resilient to DDB throttle events and network failures (Contributed by Steve Loughran) 2018-09-12 21:04:49 -07:00
Sean Mackrory
b089a06793 HADOOP-14918. Remove the Local Dynamo DB test option. Contributed by Gabor Bota. 2018-06-20 16:45:08 -06:00
Chris Douglas
45d1b0fdcc HADOOP-14696. parallel tests don't work for Windows. Contributed by Allen Wittenauer 2018-03-12 20:05:39 -07:00
Wangda Tan
60f9e60b3b Preparing for 3.2.0 development
Change-Id: I6d0e01f3d665d26573ef2b957add1cf0cddf7938
2018-02-11 11:17:38 +08:00
Steve Loughran
de8b6ca5ef HADOOP-13786 Add S3A committer for zero-rename commits to S3 endpoints.
Contributed by Steve Loughran and Ryan Blue.
2017-11-22 15:28:12 +00:00
Akira Ajisaka
6903cf096e HADOOP-13514. Upgrade maven surefire plugin to 2.20.1
Signed-off-by: Allen Wittenauer <aw@apache.org>
2017-11-19 12:39:37 -08:00
Aaron Fabbri
49467165a5
HADOOP-14738 Remove S3N and obsolete bits of S3A; rework docs. Contributed by Steve Loughran. 2017-09-14 14:10:48 -07:00
Andrew Wang
0d419c984f Preparing for 3.1.0 development 2017-09-01 11:53:48 -07:00
Steve Loughran
621b43e254
HADOOP-13345 HS3Guard: Improved Consistency for S3A.
Contributed by: Chris Nauroth, Aaron Fabbri, Mingliang Liu, Lei (Eddy) Xu,
Sean Mackrory, Steve Loughran and others.
2017-09-01 14:13:41 +01:00
Steve Loughran
7fc324aabd
HADOOP-14126. Remove jackson, joda and other transient aws SDK dependencies from hadoop-aws.
Contributed by Steve Loughran

(cherry picked from commit ced547d5f0dbea571cbc472c5f55fe89d5900a6f)
2017-08-04 11:09:08 +01:00
Andrew Wang
af2773f609 Updating version for 3.0.0-beta1 development 2017-06-29 17:57:40 -07:00
Andrew Wang
16ad896d5c Update maven version for 3.0.0-alpha4 development 2017-05-26 14:09:44 -07:00
Akira Ajisaka
0d5c8ed8e0
HADOOP-14401. maven-project-info-reports-plugin can be removed. Contributed by Andras Bokor. 2017-05-11 16:37:32 -05:00
Steve Loughran
5f934f8386
HADOOP-14305 S3A SSE tests won't run in parallel: Bad request in directory GetFileStatus.
Contributed by Steve Moist.
2017-04-24 20:33:19 +01:00
Steve Loughran
6b015d00c9
HADOOP-14321. explicitly exclude s3a root dir ITests from parallel runs.
Contributed by Steve Loughran
2017-04-19 10:21:44 +01:00
Akira Ajisaka
f597f4c43e
HADOOP-14087. S3A typo in pom.xml test exclusions. Contributed by Aaron Fabbri. 2017-03-07 15:14:55 +09:00
Akira Ajisaka
258342e76c HADOOP-14118. move jets3t into a dependency on hadoop-aws JAR. 2017-02-28 13:47:44 +09:00
Mingliang Liu
658702efff HADOOP-14040. Use shaded aws-sdk uber-JAR 1.11.86. Contributed by Steve Loughran and Sean Mackrory 2017-02-16 16:51:03 -08:00
Lei Xu
839b690ed5 HADOOP-13075. Add support for SSE-KMS and SSE-C in s3a filesystem. (Steve Moist via lei) 2017-02-11 13:59:03 -08:00
Andrew Wang
5d8b80ea9b Preparing for 3.0.0-alpha3 development 2017-01-19 15:50:07 -08:00
Mingliang Liu
af791b774b HADOOP-13050. Upgrade to AWS SDK 10.11+. Contributed by Chris Nauroth and Steve Loughran 2016-11-22 14:00:35 -08:00
Chris Nauroth
9cad3e2350 HADOOP-13614. Purge some superfluous/obsolete S3 FS tests that are slowing test runs down. Contributed by Steve Loughran. 2016-10-26 08:27:26 -07:00
Steve Loughran
6c348c5691 HADOOP-13560. S3ABlockOutputStream to support huge (many GB) file writes. Contributed by Steve Loughran 2016-10-18 21:16:02 +01:00
Chris Nauroth
69620f9559 HADOOP-13692. hadoop-aws should declare explicit dependency on Jackson 2 jars to prevent classpath conflicts. Contributed by Chris Nauroth. 2016-10-07 11:41:19 -07:00
Steve Loughran
7fdfcd8a6c HADOOP-13541 explicitly declare the Joda time version S3A depends on. Contributed by Stevel Loughran 2016-09-07 12:25:23 +01:00
Chris Nauroth
d152557cf7 HADOOP-13447. Refactor S3AFileSystem to support introduction of separate metadata repository and tests. Contributed by Chris Nauroth. 2016-09-06 09:36:21 -07:00
Chris Nauroth
6f9c346e57 HADOOP-13446. Support running isolated unit tests separate from AWS integration tests. Contributed by Chris Nauroth. 2016-08-23 07:18:49 -07:00
Andrew Wang
da456ffd62 Preparing for 3.0.0-alpha2 development 2016-07-15 19:04:17 -07:00
Steve Loughran
31ffaf76f2 HADOOP-12537 S3A to support Amazon STS temporary credentials. Contributed by Sean Mackrory. 2016-06-09 21:00:47 +01:00
Steve Loughran
c918286b17 HADOOP-13145 In DistCp, prevent unnecessary getFileStatus call when not preserving metadata. Contributed by Chris Nauroth. 2016-05-20 12:21:59 +01:00