HADOOP-18320. Fixes typos in Delegation Tokens documentation. (#4499)
Contributed By: Ahmar Suhail
This commit is contained in:
parent
dd49077aed
commit
9c6eeb699e
@ -20,7 +20,7 @@
|
|||||||
|
|
||||||
The S3A filesystem client supports `Hadoop Delegation Tokens`.
|
The S3A filesystem client supports `Hadoop Delegation Tokens`.
|
||||||
This allows YARN application like MapReduce, Distcp, Apache Flink and Apache Spark to
|
This allows YARN application like MapReduce, Distcp, Apache Flink and Apache Spark to
|
||||||
obtain credentials to access S3 buckets and pass them pass these credentials to
|
obtain credentials to access S3 buckets and pass them to
|
||||||
jobs/queries, so granting them access to the service with the same access
|
jobs/queries, so granting them access to the service with the same access
|
||||||
permissions as the user.
|
permissions as the user.
|
||||||
|
|
||||||
@ -37,9 +37,9 @@ the S3A client from the AWS STS service. They have a limited duration
|
|||||||
so restrict how long an application can access AWS on behalf of a user.
|
so restrict how long an application can access AWS on behalf of a user.
|
||||||
Clients with this token have the full permissions of the user.
|
Clients with this token have the full permissions of the user.
|
||||||
|
|
||||||
*Role Delegation Tokens:* These contain an "STS Session Token" requested by by the
|
*Role Delegation Tokens:* These contain an "STS Session Token" requested by the
|
||||||
STS "Assume Role" API, so grant the caller to interact with S3 as specific AWS
|
STS "Assume Role" API, granting the caller permission to interact with S3 using a specific IAM
|
||||||
role, *with permissions restricted to purely accessing that specific S3 bucket*.
|
role, *with permissions restricted to accessing a specific S3 bucket*.
|
||||||
|
|
||||||
Role Delegation Tokens are the most powerful. By restricting the access rights
|
Role Delegation Tokens are the most powerful. By restricting the access rights
|
||||||
of the granted STS token, no process receiving the token may perform
|
of the granted STS token, no process receiving the token may perform
|
||||||
@ -55,13 +55,13 @@ see [S3A Delegation Token Architecture](delegation_token_architecture.html).
|
|||||||
|
|
||||||
## <a name="background"></a> Background: Hadoop Delegation Tokens.
|
## <a name="background"></a> Background: Hadoop Delegation Tokens.
|
||||||
|
|
||||||
A Hadoop Delegation Token are is a byte array of data which is submitted to
|
A Hadoop Delegation Token is a byte array of data which is submitted to
|
||||||
a Hadoop services as proof that the caller has the permissions to perform
|
Hadoop services as proof that the caller has the permissions to perform
|
||||||
the operation which it is requesting —
|
the operation which it is requesting —
|
||||||
and which can be passed between applications to *delegate* those permission.
|
and which can be passed between applications to *delegate* those permissions.
|
||||||
|
|
||||||
Tokens are opaque to clients, clients who simply get a byte array
|
Tokens are opaque to clients. Clients simply get a byte array
|
||||||
of data which they must to provide to a service when required.
|
of data which they must provide to a service when required.
|
||||||
This normally contains encrypted data for use by the service.
|
This normally contains encrypted data for use by the service.
|
||||||
|
|
||||||
The service, which holds the password to encrypt/decrypt this data,
|
The service, which holds the password to encrypt/decrypt this data,
|
||||||
@ -79,7 +79,7 @@ After use, tokens may be revoked: this relies on services holding tables of
|
|||||||
valid tokens, either in memory or, for any HA service, in Apache Zookeeper or
|
valid tokens, either in memory or, for any HA service, in Apache Zookeeper or
|
||||||
similar. Revoking tokens is used to clean up after jobs complete.
|
similar. Revoking tokens is used to clean up after jobs complete.
|
||||||
|
|
||||||
Delegation support is tightly integrated with YARN: requests to launch
|
Delegation Token support is tightly integrated with YARN: requests to launch
|
||||||
containers and applications can include a list of delegation tokens to
|
containers and applications can include a list of delegation tokens to
|
||||||
pass along. These tokens are serialized with the request, saved to a file
|
pass along. These tokens are serialized with the request, saved to a file
|
||||||
on the node launching the container, and then loaded in to the credentials
|
on the node launching the container, and then loaded in to the credentials
|
||||||
@ -103,12 +103,12 @@ S3A now supports delegation tokens, so allowing a caller to acquire tokens
|
|||||||
from a local S3A Filesystem connector instance and pass them on to
|
from a local S3A Filesystem connector instance and pass them on to
|
||||||
applications to grant them equivalent or restricted access.
|
applications to grant them equivalent or restricted access.
|
||||||
|
|
||||||
These S3A Delegation Tokens are special in that they do not contain
|
These S3A Delegation Tokens are special in a way that they do not contain
|
||||||
password-protected data opaque to clients; they contain the secrets needed
|
password-protected data opaque to clients; they contain the secrets needed
|
||||||
to access the relevant S3 buckets and associated services.
|
to access the relevant S3 buckets and associated services.
|
||||||
|
|
||||||
They are obtained by requesting a delegation token from the S3A filesystem client.
|
They are obtained by requesting a delegation token from the S3A filesystem client.
|
||||||
Issued token mey be included in job submissions, passed to running applications,
|
Issued tokens may be included in job submissions, passed to running applications,
|
||||||
etc. This token is specific to an individual bucket; all buckets which a client
|
etc. This token is specific to an individual bucket; all buckets which a client
|
||||||
wishes to work with must have a separate delegation token issued.
|
wishes to work with must have a separate delegation token issued.
|
||||||
|
|
||||||
@ -117,7 +117,7 @@ class, which then supports multiple "bindings" behind it, so supporting
|
|||||||
different variants of S3A Delegation Tokens.
|
different variants of S3A Delegation Tokens.
|
||||||
|
|
||||||
Because applications only collect Delegation Tokens in secure clusters,
|
Because applications only collect Delegation Tokens in secure clusters,
|
||||||
It does mean that to be able to submit delegation tokens in transient
|
it does mean that to be able to submit delegation tokens in transient
|
||||||
cloud-hosted Hadoop clusters, _these clusters must also have Kerberos enabled_.
|
cloud-hosted Hadoop clusters, _these clusters must also have Kerberos enabled_.
|
||||||
|
|
||||||
*Tip*: you should only be deploying Hadoop in public clouds with Kerberos enabled.
|
*Tip*: you should only be deploying Hadoop in public clouds with Kerberos enabled.
|
||||||
@ -141,10 +141,10 @@ for specifics details on the (current) token lifespan.
|
|||||||
|
|
||||||
### <a name="role-tokens"></a> S3A Role Delegation Tokens
|
### <a name="role-tokens"></a> S3A Role Delegation Tokens
|
||||||
|
|
||||||
A Role Delegation Tokens is created by asking the AWS
|
A Role Delegation Token is created by asking the AWS
|
||||||
[Security Token Service](http://docs.aws.amazon.com/STS/latest/APIReference/Welcome.html)
|
[Security Token Service](http://docs.aws.amazon.com/STS/latest/APIReference/Welcome.html)
|
||||||
for set of "Assumed Role" credentials, with a AWS account specific role for a limited duration..
|
for a set of "Assumed Role" session credentials with a limited lifetime, belonging to a given IAM Role.
|
||||||
This role is restricted to only grant access the S3 bucket and all KMS keys,
|
The resulting session credentials are restricted to grant access to all KMS keys, and to the specific S3 bucket.
|
||||||
They are marshalled into the S3A Delegation Token.
|
They are marshalled into the S3A Delegation Token.
|
||||||
|
|
||||||
Other S3A connectors can extract these credentials and use them to
|
Other S3A connectors can extract these credentials and use them to
|
||||||
@ -156,13 +156,13 @@ Issued tokens cannot be renewed or revoked.
|
|||||||
|
|
||||||
### <a name="full-credentials"></a> S3A Full-Credential Delegation Tokens
|
### <a name="full-credentials"></a> S3A Full-Credential Delegation Tokens
|
||||||
|
|
||||||
Full Credential Delegation Tokens tokens contain the full AWS login details
|
Full Credential Delegation Tokens contain the full AWS login details
|
||||||
(access key and secret key) needed to access a bucket.
|
(access key and secret key) needed to access a bucket.
|
||||||
|
|
||||||
They never expire, so are the equivalent of storing the AWS account credentials
|
They never expire, so are the equivalent of storing the AWS account credentials
|
||||||
in a Hadoop, Hive, Spark configuration or similar.
|
in a Hadoop, Hive, Spark configuration or similar.
|
||||||
|
|
||||||
They differences are:
|
The differences are:
|
||||||
|
|
||||||
1. They are automatically passed from the client/user to the application.
|
1. They are automatically passed from the client/user to the application.
|
||||||
A remote application can use them to access data on behalf of the user.
|
A remote application can use them to access data on behalf of the user.
|
||||||
@ -181,21 +181,20 @@ Hadoop security enabled —which inevitably means with Kerberos.
|
|||||||
Even though S3A delegation tokens do not use Kerberos, the code in
|
Even though S3A delegation tokens do not use Kerberos, the code in
|
||||||
applications which fetch DTs is normally only executed when the cluster is
|
applications which fetch DTs is normally only executed when the cluster is
|
||||||
running in secure mode; somewhere where the `core-site.xml` configuration
|
running in secure mode; somewhere where the `core-site.xml` configuration
|
||||||
sets `hadoop.security.authentication` to to `kerberos` or another valid
|
sets `hadoop.security.authentication` to `kerberos` or another valid
|
||||||
authentication mechanism.
|
authentication mechanism.
|
||||||
|
|
||||||
*Without enabling security at this level, delegation tokens will not
|
*Without enabling security at this level, delegation tokens will not
|
||||||
be collected.*
|
be collected.*
|
||||||
|
|
||||||
Once Kerberos enabled, the process for acquiring tokens is as follows:
|
Once Kerberos is enabled, the process for acquiring tokens is as follows:
|
||||||
|
|
||||||
1. Enable Delegation token support by setting `fs.s3a.delegation.token.binding`
|
1. Enable Delegation token support by setting `fs.s3a.delegation.token.binding`
|
||||||
to the classname of the token binding to use.
|
to the classname of the token binding to use.
|
||||||
to use.
|
|
||||||
1. Add any other binding-specific settings (STS endpoint, IAM role, etc.)
|
1. Add any other binding-specific settings (STS endpoint, IAM role, etc.)
|
||||||
1. Make sure the settings are the same in the service as well as the client.
|
1. Make sure the settings are the same in the service as well as the client.
|
||||||
1. In the client, switch to using a [Hadoop Credential Provider](hadoop-project-dist/hadoop-common/CredentialProviderAPI.html)
|
1. In the client, switch to using a [Hadoop Credential Provider](hadoop-project-dist/hadoop-common/CredentialProviderAPI.html)
|
||||||
for storing your local credentials, *with a local filesystem store
|
for storing your local credentials, with a local filesystem store
|
||||||
(`localjceks:` or `jcecks://file`), so as to keep the full secrets out of any
|
(`localjceks:` or `jcecks://file`), so as to keep the full secrets out of any
|
||||||
job configurations.
|
job configurations.
|
||||||
1. Execute the client from a Kerberos-authenticated account
|
1. Execute the client from a Kerberos-authenticated account
|
||||||
@ -215,7 +214,7 @@ application configured with the login credentials for an AWS account able to iss
|
|||||||
Hadoop MapReduce jobs copy their client-side configurations with the job.
|
Hadoop MapReduce jobs copy their client-side configurations with the job.
|
||||||
If your AWS login secrets are set in an XML file then they are picked up
|
If your AWS login secrets are set in an XML file then they are picked up
|
||||||
and passed in with the job, _even if delegation tokens are used to propagate
|
and passed in with the job, _even if delegation tokens are used to propagate
|
||||||
session or role secrets.
|
session or role secrets_.
|
||||||
|
|
||||||
Spark-submit will take any credentials in the `spark-defaults.conf`file
|
Spark-submit will take any credentials in the `spark-defaults.conf`file
|
||||||
and again, spread them across the cluster.
|
and again, spread them across the cluster.
|
||||||
@ -261,7 +260,7 @@ the same STS endpoint.
|
|||||||
* In experiments, a few hundred requests per second are needed to trigger throttling,
|
* In experiments, a few hundred requests per second are needed to trigger throttling,
|
||||||
so this is very unlikely to surface in production systems.
|
so this is very unlikely to surface in production systems.
|
||||||
* The S3A filesystem connector retries all throttled requests to AWS services, including STS.
|
* The S3A filesystem connector retries all throttled requests to AWS services, including STS.
|
||||||
* Other S3 clients with use the AWS SDK will, if configured, also retry throttled requests.
|
* Other S3 clients which use the AWS SDK will, if configured, also retry throttled requests.
|
||||||
|
|
||||||
Overall, the risk of triggering STS throttling appears low, and most applications
|
Overall, the risk of triggering STS throttling appears low, and most applications
|
||||||
will recover from what is generally an intermittently used AWS service.
|
will recover from what is generally an intermittently used AWS service.
|
||||||
@ -303,7 +302,7 @@ relevant bucket, then a new session token will be issued.
|
|||||||
a session delegation token, then the existing token will be forwarded.
|
a session delegation token, then the existing token will be forwarded.
|
||||||
The life of the token will not be extended.
|
The life of the token will not be extended.
|
||||||
1. If the application requesting a token does not have either of these,
|
1. If the application requesting a token does not have either of these,
|
||||||
the the tokens cannot be issued: the operation will fail with an error.
|
the token cannot be issued: the operation will fail with an error.
|
||||||
|
|
||||||
|
|
||||||
The endpoint for STS requests are set by the same configuration
|
The endpoint for STS requests are set by the same configuration
|
||||||
@ -353,10 +352,10 @@ it is authenticated with; the role token binding will fail.
|
|||||||
|
|
||||||
When the AWS credentials supplied to the Session Delegation Token binding
|
When the AWS credentials supplied to the Session Delegation Token binding
|
||||||
through `fs.s3a.aws.credentials.provider` are themselves a set of
|
through `fs.s3a.aws.credentials.provider` are themselves a set of
|
||||||
session credentials, generated delegation tokens with simply contain these
|
session credentials, generated delegation tokens will simply contain these
|
||||||
existing session credentials, a new set of credentials obtained from STS.
|
existing session credentials, not a new set of credentials obtained from STS.
|
||||||
This is because the STS service does not let
|
This is because the STS service does not let
|
||||||
callers authenticated with session/role credentials from requesting new sessions.
|
callers authenticated with session/role credentials request new sessions.
|
||||||
|
|
||||||
This feature is useful when generating tokens from an EC2 VM instance in one IAM
|
This feature is useful when generating tokens from an EC2 VM instance in one IAM
|
||||||
role and forwarding them over to VMs which are running in a different IAM role.
|
role and forwarding them over to VMs which are running in a different IAM role.
|
||||||
@ -384,7 +383,7 @@ There are some further configuration options:
|
|||||||
|
|
||||||
| **Key** | **Meaning** | **Default** |
|
| **Key** | **Meaning** | **Default** |
|
||||||
| --- | --- | --- |
|
| --- | --- | --- |
|
||||||
| `fs.s3a.assumed.role.session.duration"` | Duration of delegation tokens | `1h` |
|
| `fs.s3a.assumed.role.session.duration` | Duration of delegation tokens | `1h` |
|
||||||
| `fs.s3a.assumed.role.arn` | ARN for role to request | (undefined) |
|
| `fs.s3a.assumed.role.arn` | ARN for role to request | (undefined) |
|
||||||
| `fs.s3a.assumed.role.sts.endpoint.region` | region for issued tokens | (undefined) |
|
| `fs.s3a.assumed.role.sts.endpoint.region` | region for issued tokens | (undefined) |
|
||||||
|
|
||||||
@ -413,7 +412,8 @@ The XML settings needed to enable session tokens are:
|
|||||||
```
|
```
|
||||||
|
|
||||||
A JSON role policy for the role/session will automatically be generated which will
|
A JSON role policy for the role/session will automatically be generated which will
|
||||||
consist of
|
consist of:
|
||||||
|
|
||||||
1. Full access to the S3 bucket for all operations used by the S3A client
|
1. Full access to the S3 bucket for all operations used by the S3A client
|
||||||
(read, write, list, multipart operations, get bucket location, etc).
|
(read, write, list, multipart operations, get bucket location, etc).
|
||||||
1. Full user access to KMS keys. This is to be able to decrypt any data
|
1. Full user access to KMS keys. This is to be able to decrypt any data
|
||||||
@ -449,7 +449,7 @@ relevant bucket, then a full credential token will be issued.
|
|||||||
a session delegation token, then the existing token will be forwarded.
|
a session delegation token, then the existing token will be forwarded.
|
||||||
The life of the token will not be extended.
|
The life of the token will not be extended.
|
||||||
1. If the application requesting a token does not have either of these,
|
1. If the application requesting a token does not have either of these,
|
||||||
the the tokens cannot be issued: the operation will fail with an error.
|
the tokens cannot be issued: the operation will fail with an error.
|
||||||
|
|
||||||
## <a name="managing_token_duration"></a> Managing the Delegation Tokens Duration
|
## <a name="managing_token_duration"></a> Managing the Delegation Tokens Duration
|
||||||
|
|
||||||
@ -465,7 +465,7 @@ that of the role itself: 1h by default, though this can be changed to
|
|||||||
12h [In the IAM Console](https://console.aws.amazon.com/iam/home#/roles),
|
12h [In the IAM Console](https://console.aws.amazon.com/iam/home#/roles),
|
||||||
or from the AWS CLI.
|
or from the AWS CLI.
|
||||||
|
|
||||||
*Without increasing the duration of role, one hour is the maximum value;
|
Without increasing the duration of the role, one hour is the maximum value;
|
||||||
the error message `The requested DurationSeconds exceeds the MaxSessionDuration set for this role`
|
the error message `The requested DurationSeconds exceeds the MaxSessionDuration set for this role`
|
||||||
is returned if the requested duration of a Role Delegation Token is greater
|
is returned if the requested duration of a Role Delegation Token is greater
|
||||||
than that available for the role.
|
than that available for the role.
|
||||||
@ -545,7 +545,7 @@ Consult [troubleshooting Assumed Roles](assumed_roles.html#troubleshooting)
|
|||||||
for details on AWS error messages related to AWS IAM roles.
|
for details on AWS error messages related to AWS IAM roles.
|
||||||
|
|
||||||
The [cloudstore](https://github.com/steveloughran/cloudstore) module's StoreDiag
|
The [cloudstore](https://github.com/steveloughran/cloudstore) module's StoreDiag
|
||||||
utility can also be used to explore delegation token support
|
utility can also be used to explore delegation token support.
|
||||||
|
|
||||||
|
|
||||||
### Submitted job cannot authenticate
|
### Submitted job cannot authenticate
|
||||||
@ -557,7 +557,7 @@ There are many causes for this; delegation tokens add some more.
|
|||||||
|
|
||||||
* This user is not `kinit`-ed in to Kerberos. Use `klist` and
|
* This user is not `kinit`-ed in to Kerberos. Use `klist` and
|
||||||
`hadoop kdiag` to see the Kerberos authentication state of the logged in user.
|
`hadoop kdiag` to see the Kerberos authentication state of the logged in user.
|
||||||
* The filesystem instance on the client has not had a token binding set in
|
* The filesystem instance on the client does not have a token binding set in
|
||||||
`fs.s3a.delegation.token.binding`, so does not attempt to issue any.
|
`fs.s3a.delegation.token.binding`, so does not attempt to issue any.
|
||||||
* The job submission is not aware that access to the specific S3 buckets
|
* The job submission is not aware that access to the specific S3 buckets
|
||||||
are required. Review the application's submission mechanism to determine
|
are required. Review the application's submission mechanism to determine
|
||||||
@ -717,7 +717,7 @@ In the initial results of these tests:
|
|||||||
|
|
||||||
* A few hundred requests a second can be made before STS block the caller.
|
* A few hundred requests a second can be made before STS block the caller.
|
||||||
* The throttling does not last very long (seconds)
|
* The throttling does not last very long (seconds)
|
||||||
* Tt does not appear to affect any other STS endpoints.
|
* It does not appear to affect any other STS endpoints.
|
||||||
|
|
||||||
If developers wish to experiment with these tests and provide more detailed
|
If developers wish to experiment with these tests and provide more detailed
|
||||||
analysis, we would welcome this. Do bear in mind that all users of the
|
analysis, we would welcome this. Do bear in mind that all users of the
|
||||||
@ -749,7 +749,7 @@ Look at the other examples to see what to do; `SessionTokenIdentifier` does
|
|||||||
most of the work.
|
most of the work.
|
||||||
|
|
||||||
Having a `toString()` method which is informative is ideal for the `hdfs creds`
|
Having a `toString()` method which is informative is ideal for the `hdfs creds`
|
||||||
command as well as debugging: *but do not print secrets*
|
command as well as debugging: *but do not print secrets*.
|
||||||
|
|
||||||
*Important*: Add no references to any AWS SDK class, to
|
*Important*: Add no references to any AWS SDK class, to
|
||||||
ensure it can be safely deserialized whenever the relevant token
|
ensure it can be safely deserialized whenever the relevant token
|
||||||
@ -835,13 +835,13 @@ Tests the lifecycle of session tokens.
|
|||||||
#### Integration Test `ITestSessionDelegationInFileystem`.
|
#### Integration Test `ITestSessionDelegationInFileystem`.
|
||||||
|
|
||||||
This collects DTs from one filesystem, and uses that to create a new FS instance and
|
This collects DTs from one filesystem, and uses that to create a new FS instance and
|
||||||
then perform filesystem operations. A miniKDC is instantiated
|
then perform filesystem operations. A miniKDC is instantiated.
|
||||||
|
|
||||||
* Take care to remove all login secrets from the environment, so as to make sure that
|
* Take care to remove all login secrets from the environment, so as to make sure that
|
||||||
the second instance is picking up the DT information.
|
the second instance is picking up the DT information.
|
||||||
* `UserGroupInformation.reset()` can be used to reset user secrets after every test
|
* `UserGroupInformation.reset()` can be used to reset user secrets after every test
|
||||||
case (e.g. teardown), so that issued DTs from one test case do not contaminate the next.
|
case (e.g. teardown), so that issued DTs from one test case do not contaminate the next.
|
||||||
* its subclass, `ITestRoleDelegationInFileystem` adds a check that the current credentials
|
* It's subclass, `ITestRoleDelegationInFileystem` adds a check that the current credentials
|
||||||
in the DT cannot be used to access data on other buckets —that is, the active
|
in the DT cannot be used to access data on other buckets —that is, the active
|
||||||
session really is restricted to the target bucket.
|
session really is restricted to the target bucket.
|
||||||
|
|
||||||
@ -851,7 +851,7 @@ session really is restricted to the target bucket.
|
|||||||
It's not easy to bring up a YARN cluster with a secure HDFS and miniKDC controller in
|
It's not easy to bring up a YARN cluster with a secure HDFS and miniKDC controller in
|
||||||
test cases —this test, the closest there is to an end-to-end test,
|
test cases —this test, the closest there is to an end-to-end test,
|
||||||
uses mocking to mock the RPC calls to the YARN AM, and then verifies that the tokens
|
uses mocking to mock the RPC calls to the YARN AM, and then verifies that the tokens
|
||||||
have been collected in the job context,
|
have been collected in the job context.
|
||||||
|
|
||||||
#### Load Test `ILoadTestSessionCredentials`
|
#### Load Test `ILoadTestSessionCredentials`
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user