hadoop

Author	SHA1	Message	Date
Steve Loughran	4c55adbb6b	HADOOP-19205. S3A: initialization/close slower than with v1 SDK (#6892 ) Adds new ClientManager interface/implementation which provides on-demand creation of synchronous and asynchronous s3 clients, s3 transfer manager, and in close() terminates these. S3A FS is modified to * Create a ClientManagerImpl instance and pass down to its S3Store. * Use the same ClientManager interface against S3Store to demand-create the services. * Only create the async client as part of the transfer manager creation, which will take place during the first rename() operation. * Statistics on client creation count and duration are recorded. + Statistics on the time to initialize and shutdown the S3A FS are collected in IOStatistics for reporting. Adds to hadoop common class LazyAtomicReference<T> implements CallableRaisingIOE<T>, Supplier<T> and subclass LazyAutoCloseableReference<T extends AutoCloseable> extends LazyAtomicReference<T> implements AutoCloseable These evaluate the Supplier<T>/CallableRaisingIOE<T> they were constructed with on the first (successful) read of the the value. Any exception raised during this operation will be rethrown, and on future evaluations the same operation retried. These classes implement the Supplier and CallableRaisingIOE interfaces so can actually be used for to implement lazy function evaluation as Haskell and some other functional languages do. LazyAutoCloseableReference is AutoCloseable; its close() method will close the inner reference if it is set This class is used in ClientManagerImpl for the lazy S3 Cliehnt creation and closure. Contributed by Steve Loughran.	2024-07-05 16:38:37 +01:00
Steve Loughran	81d90fd65b	HADOOP-18073. S3A: Upgrade AWS SDK to V2 (#5995 ) This patch migrates the S3A connector to use the V2 AWS SDK. This is a significant change at the source code level. Any applications using the internal extension/override points in the filesystem connector are likely to break. This includes but is not limited to: - Code invoking methods on the S3AFileSystem class which used classes from the V1 SDK. - The ability to define the factory for the `AmazonS3` client, and to retrieve it from the S3AFileSystem. There is a new factory API and a special interface S3AInternals to access a limited set of internal classes and operations. - Delegation token and auditing extensions. - Classes trying to integrate with the AWS SDK. All standard V1 credential providers listed in the option fs.s3a.aws.credentials.provider will be automatically remapped to their V2 equivalent. Other V1 Credential Providers are supported, but only if the V1 SDK is added back to the classpath. The SDK Signing plugin has changed; all v1 signers are incompatible. There is no support for the S3 "v2" signing algorithm. Finally, the aws-sdk-bundle JAR has been replaced by the shaded V2 equivalent, "bundle.jar", which is now exported by the hadoop-aws module. Consult the document aws_sdk_upgrade for the full details. Contributed by Ahmar Suhail + some bits by Steve Loughran	2023-09-11 14:30:25 +01:00
Viraj Jasani	648071e197	HADOOP-18466. Limit the findbugs suppression IS2_INCONSISTENT_SYNC to S3AFileSystem field (#4926 ) Follow-on to HADOOP-18455. Contributed by Viraj Jasani	2022-09-26 18:56:58 +01:00
Viraj Jasani	084b68e380	HADOOP-18455. S3A prefetching executor should be closed (#4879 ) follow-on patch to HADOOP-18186. Contributed by: Viraj Jasani	2022-09-22 00:22:41 +05:30
Steve Loughran	e0cd0a82e0	HADOOP-16202. Enhanced openFile(): hadoop-aws changes. (#2584/3) S3A input stream support for the few fs.option.openfile settings. As well as supporting the read policy option and values, if the file length is declared in fs.option.openfile.length then no HEAD request will be issued when opening a file. This can cut a few tens of milliseconds off the operation. The patch adds a new openfile parameter/FS configuration option fs.s3a.input.async.drain.threshold (default: 16000). It declares the number of bytes remaining in the http input stream above which any operation to read and discard the rest of the stream, "draining", is executed asynchronously. This asynchronous draining offers some performance benefit on seek-heavy file IO. Contributed by Steve Loughran. Change-Id: I9b0626bbe635e9fd97ac0f463f5e7167e0111e39	2022-04-24 17:33:05 +01:00
Steve Loughran	14ba19af06	HADOOP-17409. Remove s3guard from S3A module (#3534 ) Completely removes S3Guard support from the S3A codebase. If the connector is configured to use any metastore other than the null and local stores (i.e. DynamoDB is selected) the s3a client will raise an exception and refuse to initialize. This is to ensure that there is no mix of S3Guard enabled and disabled deployments with the same configuration but different hadoop releases -it must be turned off completely. The "hadoop s3guard" command has been retained -but the supported subcommands have been reduced to those which are not purely S3Guard related: "bucket-info" and "uploads". This is major change in terms of the number of files changed; before cherry picking subsequent s3a patches into older releases, this patch will probably need backporting first. Goodbye S3Guard, your work is done. Time to die. Contributed by Steve Loughran.	2022-01-17 18:08:57 +00:00
Chao Sun	176bd88890	HADOOP-16080. hadoop-aws does not work with hadoop-client-api. (#2522 ) Contributed by Chao Sun. (Cherry-picked via PR #2575)	2021-03-09 20:01:29 +00:00
Steve Loughran	617af28e80	HADOOP-17271. S3A connector to support IOStatistics. (#2580 ) S3A connector to support the IOStatistics API of HADOOP-16830, This is a major rework of the S3A Statistics collection to * Embrace the IOStatistics APIs * Move from direct references of S3AInstrumention statistics collectors to interface/implementation classes in new packages. * Ubiquitous support of IOStatistics, including: S3AFileSystem, input and output streams, RemoteIterator instances provided in list calls. * Adoption of new statistic names from hadoop-common Regarding statistic collection, as well as all existing statistics, the connector now records min/max/mean durations of HTTP GET and HEAD requests, and those of LIST operations. Contributed by Steve Loughran.	2020-12-31 21:55:39 +00:00
Steve Loughran	49df838995	HADOOP-16697. Tune/audit S3A authoritative mode. Contains: HADOOP-16474. S3Guard ProgressiveRenameTracker to mark destination dirirectory as authoritative on success. HADOOP-16684. S3guard bucket info to list a bit more about authoritative paths. HADOOP-16722. S3GuardTool to support FilterFileSystem. This patch improves the marking of newly created/import directory trees in S3Guard DynamoDB tables as authoritative. Specific changes: * Renamed directories are marked as authoritative if the entire operation succeeded (HADOOP-16474). * When updating parent table entries as part of any table write, there's no overwriting of their authoritative flag. s3guard import changes: * new -verbose flag to print out what is going on. * The "s3guard import" command lets you declare that a directory tree is to be marked as authoritative hadoop s3guard import -authoritative -verbose s3a://bucket/path When importing a listing and a file is found, the import tool queries the metastore and only updates the entry if the file is different from before, where different == new timestamp, etag, or length. S3Guard can get timestamp differences due to clock skew in PUT operations. As the recursive list performed by the import command doesn't retrieve the versionID, the existing entry may in fact be more complete. When updating an existing due to clock skew the existing version ID is propagated to the new entry (note: the etags must match; this is needed to deal with inconsistent listings). There is a new s3guard command to audit a s3guard bucket/path's authoritative state: hadoop s3guard authoritative -check-config s3a://bucket/path This is primarily for testing/auditing. The s3guard bucket-info command also provides some more details on the authoritative state of a store (HADOOP-16684). Change-Id: I58001341c04f6f3597fcb4fcb1581ccefeb77d91	2020-01-10 11:11:56 +00:00
Steve Loughran	f365957c63	HADOOP-15229. Add FileSystem builder-based openFile() API to match createFile(); S3A to implement S3 Select through this API. The new openFile() API is asynchronous, and implemented across FileSystem and FileContext. The MapReduce V2 inputs are moved to this API, and you can actually set must/may options to pass in. This is more useful for setting things like s3a seek policy than for S3 select, as the existing input format/record readers can't handle S3 select output where the stream is shorter than the file length, and splitting plain text is suboptimal. Future work is needed there. In the meantime, any/all filesystem connectors are now free to add their own filesystem-specific configuration parameters which can be set in jobs and used to set filesystem input stream options (seek policy, retry, encryption secrets, etc). Contributed by Steve Loughran	2019-02-05 11:51:02 +00:00
Steve Loughran	8110d6a0d5	HADOOP-13761. S3Guard: implement retries for DDB failures and throttling; translate exceptions. Contributed by Aaron Fabbri.	2018-03-05 14:06:20 +00:00
Aaron Fabbri	49467165a5	HADOOP-14738 Remove S3N and obsolete bits of S3A; rework docs. Contributed by Steve Loughran.	2017-09-14 14:10:48 -07:00
Steve Loughran	621b43e254	HADOOP-13345 HS3Guard: Improved Consistency for S3A. Contributed by: Chris Nauroth, Aaron Fabbri, Mingliang Liu, Lei (Eddy) Xu, Sean Mackrory, Steve Loughran and others.	2017-09-01 14:13:41 +01:00
Ravi Prakash	4aefe119a0	HADOOP-3733. "s3x:" URLs break when Secret Key contains a slash, even if encoded. Contributed by Steve Loughran.	2016-06-16 11:13:35 -07:00
Steve Loughran	27c4e90efc	HADOOP-13028 add low level counter metrics for S3A; use in read performance tests. contributed by: stevel patch includes HADOOP-12844 Recover when S3A fails on IOException in read() HADOOP-13058 S3A FS fails during init against a read-only FS if multipart purge HADOOP-13047 S3a Forward seek in stream length to be configurable	2016-05-12 19:24:20 +01:00
Colin Patrick Mccabe	5ec7fcd9dd	HADOOP-11074. Move s3-related FS connector code to hadoop-aws. (David S. Wang via Colin Patrick McCabe)	2014-09-10 16:14:53 -07:00
Steve Loughran	59384dfb71	HADOOP-10373 create tools/hadoop-amazon for aws/EMR support	2014-09-02 20:11:13 +01:00

17 Commits