HADOOP-16792: Make S3 client request timeout configurable.

Contributed by Mustafa Iman.

This adds a new configuration option fs.s3a.connection.request.timeout
to declare the time out on HTTP requests to the AWS service;
0 means no timeout.
Measured in seconds; the usual time suffixes are all supported

Important: this is the maximum duration of any AWS service call,
including upload and copy operations. If non-zero, it must be larger
than the time to upload multi-megabyte blocks to S3 from the client,
and to rename many-GB files. Use with care.

Change-Id: I407745341068b702bf8f401fb96450a9f987c51c
This commit is contained in:
Mustafa Iman 2020-01-24 13:37:07 +00:00 committed by Steve Loughran
parent 978c487672
commit 839054754b
No known key found for this signature in database
GPG Key ID: D22CF846DBB162A0
6 changed files with 102 additions and 0 deletions

View File

@ -1940,6 +1940,23 @@
</description>
</property>
<property>
<name>fs.s3a.connection.request.timeout</name>
<value>0</value>
<description>
Time out on HTTP requests to the AWS service; 0 means no timeout.
Measured in seconds; the usual time suffixes are all supported
Important: this is the maximum duration of any AWS service call,
including upload and copy operations. If non-zero, it must be larger
than the time to upload multi-megabyte blocks to S3 from the client,
and to rename many-GB files. Use with care.
Values that are larger than Integer.MAX_VALUE milliseconds are
converged to Integer.MAX_VALUE milliseconds
</description>
</property>
<property>
<name>fs.s3a.etag.checksum.enabled</name>
<value>false</value>

View File

@ -187,6 +187,11 @@ private Constants() {
public static final String SOCKET_TIMEOUT = "fs.s3a.connection.timeout";
public static final int DEFAULT_SOCKET_TIMEOUT = 200000;
// milliseconds until a request is timed-out
public static final String REQUEST_TIMEOUT =
"fs.s3a.connection.request.timeout";
public static final int DEFAULT_REQUEST_TIMEOUT = 0;
// socket send buffer to be used in Amazon client
public static final String SOCKET_SEND_BUFFER = "fs.s3a.socket.send.buffer";
public static final int DEFAULT_SOCKET_SEND_BUFFER = 8 * 1024;

View File

@ -82,6 +82,7 @@
import java.util.Optional;
import java.util.Set;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import static org.apache.commons.lang3.StringUtils.isEmpty;
import static org.apache.hadoop.fs.s3a.Constants.*;
@ -1284,6 +1285,15 @@ public static void initConnectionSettings(Configuration conf,
DEFAULT_SOCKET_SEND_BUFFER, 2048);
int sockRecvBuffer = intOption(conf, SOCKET_RECV_BUFFER,
DEFAULT_SOCKET_RECV_BUFFER, 2048);
long requestTimeoutMillis = conf.getTimeDuration(REQUEST_TIMEOUT,
DEFAULT_REQUEST_TIMEOUT, TimeUnit.SECONDS, TimeUnit.MILLISECONDS);
if (requestTimeoutMillis > Integer.MAX_VALUE) {
LOG.debug("Request timeout is too high({} ms). Setting to {} ms instead",
requestTimeoutMillis, Integer.MAX_VALUE);
requestTimeoutMillis = Integer.MAX_VALUE;
}
awsConf.setRequestTimeout((int) requestTimeoutMillis);
awsConf.setSocketBufferSizeHints(sockSendBuffer, sockRecvBuffer);
String signerOverride = conf.getTrimmed(SIGNING_ALGORITHM, "");
if (!signerOverride.isEmpty()) {

View File

@ -983,6 +983,23 @@ options are covered in [Testing](./testing.md).
<description>Select which version of the S3 SDK's List Objects API to use.
Currently support 2 (default) and 1 (older API).</description>
</property>
<property>
<name>fs.s3a.connection.request.timeout</name>
<value>0</value>
<description>
Time out on HTTP requests to the AWS service; 0 means no timeout.
Measured in seconds; the usual time suffixes are all supported
Important: this is the maximum duration of any AWS service call,
including upload and copy operations. If non-zero, it must be larger
than the time to upload multi-megabyte blocks to S3 from the client,
and to rename many-GB files. Use with care.
Values that are larger than Integer.MAX_VALUE milliseconds are
converged to Integer.MAX_VALUE milliseconds
</description>
</property>
```
## <a name="retry_and_recovery"></a>Retry and Recovery

View File

@ -1384,3 +1384,43 @@ For this reason, the number of retry events are limited.
</description>
</property>
```
### <a name="aws-timeouts"></a> Tuning AWS request timeouts
It is possible to configure a global timeout for AWS service calls using following property:
```xml
<property>
<name>fs.s3a.connection.request.timeout</name>
<value>0</value>
<description>
Time out on HTTP requests to the AWS service; 0 means no timeout.
Measured in seconds; the usual time suffixes are all supported
Important: this is the maximum duration of any AWS service call,
including upload and copy operations. If non-zero, it must be larger
than the time to upload multi-megabyte blocks to S3 from the client,
and to rename many-GB files. Use with care.
Values that are larger than Integer.MAX_VALUE milliseconds are
converged to Integer.MAX_VALUE milliseconds
</description>
</property>
```
If this value is configured too low, user may encounter `SdkClientException`s due to many requests
timing-out.
```
com.amazonaws.SdkClientException: Unable to execute HTTP request:
Request did not complete before the request timeout configuration.:
Unable to execute HTTP request: Request did not complete before the request timeout configuration.
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:205)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:112)
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:315)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:407)
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:311)
```
When this happens, try to set `fs.s3a.connection.request.timeout` to a larger value or disable it
completely by setting it to `0`.

View File

@ -390,6 +390,19 @@ public void testCustomUserAgent() throws Exception {
awsConf.getUserAgentPrefix());
}
@Test
public void testRequestTimeout() throws Exception {
conf = new Configuration();
conf.set(REQUEST_TIMEOUT, "120");
fs = S3ATestUtils.createTestFileSystem(conf);
AmazonS3 s3 = fs.getAmazonS3ClientForTesting("Request timeout (ms)");
ClientConfiguration awsConf = getField(s3, ClientConfiguration.class,
"clientConfiguration");
assertEquals("Configured " + REQUEST_TIMEOUT +
" is different than what AWS sdk configuration uses internally",
120000, awsConf.getRequestTimeout());
}
@Test
public void testCloseIdempotent() throws Throwable {
conf = new Configuration();