If a file is opened for reading through the S3A connector
is not closed, then when garbage collection takes place
* An error message is reported at WARN, including the file name.
* A stack trace of where the stream was created is reported
at INFO.
* A best-effort attempt is made to release any active HTTPS
connection.
* The filesystem IOStatistic stream_leaks is incremented.
The intent is to make it easier to identify where streams
are being opened and not closed -as these consume resources
including often HTTPS connections from the connection pool
of limited size.
It MUST NOT be relied on as a way to clean up open
files/streams automatically; some of the normal actions of
the close() method are omitted.
Instead: view the warning messages and IOStatistics as a
sign of a problem, the stack trace as a way of identifying
what application code/library needs to be investigated.
Contributed by Steve Loughran
Add support for S3 client side encryption (CSE).
CSE can configured in two modes:
- CSE-KMS where keys are provided by AWS KMS
- CSE-CUSTOM where custom keys are provided by implementing
a custom keyring.
CSE requires an encryption library:
amazon-s3-encryption-client-java.jar
This is _not_ included in the shaded bundle.jar
and is released separately.
The version used is currently 3.1.1
Contributed by Syed Shameerur Rahman.
* All field access is now via setter/getter methods
* To use Avro to marshal Serializable objects,
the packages they are in must be declared in the system property
"org.apache.avro.SERIALIZABLE_PACKAGES"
This is required to address
- CVE-2024-47561
- CVE-2023-39410
This change is not backwards compatible.
Contributed by Dominik Diedrich
* Windows doesn't want the
macro _JNI_IMPORT_OR_EXPORT_
to be defined in the function
definition. It fails to compile with
the following error -
"definition of dllimport function
not allowed".
* However, Linux needs it. Hence,
we're going to add this macro
based on the OS.
* Also, we'll be compiling the `hdfs`
target as an object library so that
we can avoid linking to `jvm`
library for `get_jni_test` target.
Reviewed-by: Steve Loughran <stevel@apache.org>
Reviewed-by: Attila Doroszlai <adoroszlai@apache.org>
Reviewed-by: Cheng Pan <chengpan@apache.org>
Reviewed-by: Min Yan <yaommen@gmail.com>
Signed-off-by: Shilun Fan <slfan1989@apache.org>
ChecksumFileSystem creates the chunked ranges based on the checksum chunk size and then calls
readVectored on Raw Local which may lead to overlapping ranges in some cases.
Contributed by: Mukund Thakur
This sets a different timeout for data upload PUT/POST calls to all
other requests, so that slow block uploads do not trigger timeouts
as rapidly as normal requests. This was always the behavior
in the V1 AWS SDK; for V2 we have to explicitly set it on the operations
we want to give extended timeouts.
Option: fs.s3a.connection.part.upload.timeout
Default: 15m
Contributed by Steve Loughran
* HttpReferrerAuditHeader is thread safe, copying the lists/maps passed
in and using synchronized methods when necessary.
* All exceptions raised when building referrer header are caught
and swallowed.
* The first such error is logged at warn,
* all errors plus stack are logged at debug
Contributed by Steve Loughran
Adds new option
s3a.cross.region.access.enabled
Which is true by default
This enables cross region access as a separate config and enable/disables it irrespective of region/endpoint is set.
Contributed by Syed Shameerur Rahman
This moves Hadoop to Apache commons-collections4.
Apache commons-collections has been removed and is completely banned from the source code.
Contributed by Nihal Jain