HADOOP-18320. Fixes typos in Delegation Tokens documentation. (#4499)
Contributed By: Ahmar Suhail
This commit is contained in:
parent
dd49077aed
commit
9c6eeb699e
@ -20,7 +20,7 @@
|
||||
|
||||
The S3A filesystem client supports `Hadoop Delegation Tokens`.
|
||||
This allows YARN application like MapReduce, Distcp, Apache Flink and Apache Spark to
|
||||
obtain credentials to access S3 buckets and pass them pass these credentials to
|
||||
obtain credentials to access S3 buckets and pass them to
|
||||
jobs/queries, so granting them access to the service with the same access
|
||||
permissions as the user.
|
||||
|
||||
@ -37,9 +37,9 @@ the S3A client from the AWS STS service. They have a limited duration
|
||||
so restrict how long an application can access AWS on behalf of a user.
|
||||
Clients with this token have the full permissions of the user.
|
||||
|
||||
*Role Delegation Tokens:* These contain an "STS Session Token" requested by by the
|
||||
STS "Assume Role" API, so grant the caller to interact with S3 as specific AWS
|
||||
role, *with permissions restricted to purely accessing that specific S3 bucket*.
|
||||
*Role Delegation Tokens:* These contain an "STS Session Token" requested by the
|
||||
STS "Assume Role" API, granting the caller permission to interact with S3 using a specific IAM
|
||||
role, *with permissions restricted to accessing a specific S3 bucket*.
|
||||
|
||||
Role Delegation Tokens are the most powerful. By restricting the access rights
|
||||
of the granted STS token, no process receiving the token may perform
|
||||
@ -55,13 +55,13 @@ see [S3A Delegation Token Architecture](delegation_token_architecture.html).
|
||||
|
||||
## <a name="background"></a> Background: Hadoop Delegation Tokens.
|
||||
|
||||
A Hadoop Delegation Token are is a byte array of data which is submitted to
|
||||
a Hadoop services as proof that the caller has the permissions to perform
|
||||
A Hadoop Delegation Token is a byte array of data which is submitted to
|
||||
Hadoop services as proof that the caller has the permissions to perform
|
||||
the operation which it is requesting —
|
||||
and which can be passed between applications to *delegate* those permission.
|
||||
and which can be passed between applications to *delegate* those permissions.
|
||||
|
||||
Tokens are opaque to clients, clients who simply get a byte array
|
||||
of data which they must to provide to a service when required.
|
||||
Tokens are opaque to clients. Clients simply get a byte array
|
||||
of data which they must provide to a service when required.
|
||||
This normally contains encrypted data for use by the service.
|
||||
|
||||
The service, which holds the password to encrypt/decrypt this data,
|
||||
@ -79,7 +79,7 @@ After use, tokens may be revoked: this relies on services holding tables of
|
||||
valid tokens, either in memory or, for any HA service, in Apache Zookeeper or
|
||||
similar. Revoking tokens is used to clean up after jobs complete.
|
||||
|
||||
Delegation support is tightly integrated with YARN: requests to launch
|
||||
Delegation Token support is tightly integrated with YARN: requests to launch
|
||||
containers and applications can include a list of delegation tokens to
|
||||
pass along. These tokens are serialized with the request, saved to a file
|
||||
on the node launching the container, and then loaded in to the credentials
|
||||
@ -103,12 +103,12 @@ S3A now supports delegation tokens, so allowing a caller to acquire tokens
|
||||
from a local S3A Filesystem connector instance and pass them on to
|
||||
applications to grant them equivalent or restricted access.
|
||||
|
||||
These S3A Delegation Tokens are special in that they do not contain
|
||||
These S3A Delegation Tokens are special in a way that they do not contain
|
||||
password-protected data opaque to clients; they contain the secrets needed
|
||||
to access the relevant S3 buckets and associated services.
|
||||
|
||||
They are obtained by requesting a delegation token from the S3A filesystem client.
|
||||
Issued token mey be included in job submissions, passed to running applications,
|
||||
Issued tokens may be included in job submissions, passed to running applications,
|
||||
etc. This token is specific to an individual bucket; all buckets which a client
|
||||
wishes to work with must have a separate delegation token issued.
|
||||
|
||||
@ -117,7 +117,7 @@ class, which then supports multiple "bindings" behind it, so supporting
|
||||
different variants of S3A Delegation Tokens.
|
||||
|
||||
Because applications only collect Delegation Tokens in secure clusters,
|
||||
It does mean that to be able to submit delegation tokens in transient
|
||||
it does mean that to be able to submit delegation tokens in transient
|
||||
cloud-hosted Hadoop clusters, _these clusters must also have Kerberos enabled_.
|
||||
|
||||
*Tip*: you should only be deploying Hadoop in public clouds with Kerberos enabled.
|
||||
@ -141,10 +141,10 @@ for specifics details on the (current) token lifespan.
|
||||
|
||||
### <a name="role-tokens"></a> S3A Role Delegation Tokens
|
||||
|
||||
A Role Delegation Tokens is created by asking the AWS
|
||||
A Role Delegation Token is created by asking the AWS
|
||||
[Security Token Service](http://docs.aws.amazon.com/STS/latest/APIReference/Welcome.html)
|
||||
for set of "Assumed Role" credentials, with a AWS account specific role for a limited duration..
|
||||
This role is restricted to only grant access the S3 bucket and all KMS keys,
|
||||
for a set of "Assumed Role" session credentials with a limited lifetime, belonging to a given IAM Role.
|
||||
The resulting session credentials are restricted to grant access to all KMS keys, and to the specific S3 bucket.
|
||||
They are marshalled into the S3A Delegation Token.
|
||||
|
||||
Other S3A connectors can extract these credentials and use them to
|
||||
@ -156,13 +156,13 @@ Issued tokens cannot be renewed or revoked.
|
||||
|
||||
### <a name="full-credentials"></a> S3A Full-Credential Delegation Tokens
|
||||
|
||||
Full Credential Delegation Tokens tokens contain the full AWS login details
|
||||
Full Credential Delegation Tokens contain the full AWS login details
|
||||
(access key and secret key) needed to access a bucket.
|
||||
|
||||
They never expire, so are the equivalent of storing the AWS account credentials
|
||||
in a Hadoop, Hive, Spark configuration or similar.
|
||||
|
||||
They differences are:
|
||||
The differences are:
|
||||
|
||||
1. They are automatically passed from the client/user to the application.
|
||||
A remote application can use them to access data on behalf of the user.
|
||||
@ -181,21 +181,20 @@ Hadoop security enabled —which inevitably means with Kerberos.
|
||||
Even though S3A delegation tokens do not use Kerberos, the code in
|
||||
applications which fetch DTs is normally only executed when the cluster is
|
||||
running in secure mode; somewhere where the `core-site.xml` configuration
|
||||
sets `hadoop.security.authentication` to to `kerberos` or another valid
|
||||
sets `hadoop.security.authentication` to `kerberos` or another valid
|
||||
authentication mechanism.
|
||||
|
||||
*Without enabling security at this level, delegation tokens will not
|
||||
be collected.*
|
||||
|
||||
Once Kerberos enabled, the process for acquiring tokens is as follows:
|
||||
Once Kerberos is enabled, the process for acquiring tokens is as follows:
|
||||
|
||||
1. Enable Delegation token support by setting `fs.s3a.delegation.token.binding`
|
||||
to the classname of the token binding to use.
|
||||
to use.
|
||||
1. Add any other binding-specific settings (STS endpoint, IAM role, etc.)
|
||||
1. Make sure the settings are the same in the service as well as the client.
|
||||
1. In the client, switch to using a [Hadoop Credential Provider](hadoop-project-dist/hadoop-common/CredentialProviderAPI.html)
|
||||
for storing your local credentials, *with a local filesystem store
|
||||
for storing your local credentials, with a local filesystem store
|
||||
(`localjceks:` or `jcecks://file`), so as to keep the full secrets out of any
|
||||
job configurations.
|
||||
1. Execute the client from a Kerberos-authenticated account
|
||||
@ -215,7 +214,7 @@ application configured with the login credentials for an AWS account able to iss
|
||||
Hadoop MapReduce jobs copy their client-side configurations with the job.
|
||||
If your AWS login secrets are set in an XML file then they are picked up
|
||||
and passed in with the job, _even if delegation tokens are used to propagate
|
||||
session or role secrets.
|
||||
session or role secrets_.
|
||||
|
||||
Spark-submit will take any credentials in the `spark-defaults.conf`file
|
||||
and again, spread them across the cluster.
|
||||
@ -261,7 +260,7 @@ the same STS endpoint.
|
||||
* In experiments, a few hundred requests per second are needed to trigger throttling,
|
||||
so this is very unlikely to surface in production systems.
|
||||
* The S3A filesystem connector retries all throttled requests to AWS services, including STS.
|
||||
* Other S3 clients with use the AWS SDK will, if configured, also retry throttled requests.
|
||||
* Other S3 clients which use the AWS SDK will, if configured, also retry throttled requests.
|
||||
|
||||
Overall, the risk of triggering STS throttling appears low, and most applications
|
||||
will recover from what is generally an intermittently used AWS service.
|
||||
@ -303,7 +302,7 @@ relevant bucket, then a new session token will be issued.
|
||||
a session delegation token, then the existing token will be forwarded.
|
||||
The life of the token will not be extended.
|
||||
1. If the application requesting a token does not have either of these,
|
||||
the the tokens cannot be issued: the operation will fail with an error.
|
||||
the token cannot be issued: the operation will fail with an error.
|
||||
|
||||
|
||||
The endpoint for STS requests are set by the same configuration
|
||||
@ -353,10 +352,10 @@ it is authenticated with; the role token binding will fail.
|
||||
|
||||
When the AWS credentials supplied to the Session Delegation Token binding
|
||||
through `fs.s3a.aws.credentials.provider` are themselves a set of
|
||||
session credentials, generated delegation tokens with simply contain these
|
||||
existing session credentials, a new set of credentials obtained from STS.
|
||||
session credentials, generated delegation tokens will simply contain these
|
||||
existing session credentials, not a new set of credentials obtained from STS.
|
||||
This is because the STS service does not let
|
||||
callers authenticated with session/role credentials from requesting new sessions.
|
||||
callers authenticated with session/role credentials request new sessions.
|
||||
|
||||
This feature is useful when generating tokens from an EC2 VM instance in one IAM
|
||||
role and forwarding them over to VMs which are running in a different IAM role.
|
||||
@ -384,7 +383,7 @@ There are some further configuration options:
|
||||
|
||||
| **Key** | **Meaning** | **Default** |
|
||||
| --- | --- | --- |
|
||||
| `fs.s3a.assumed.role.session.duration"` | Duration of delegation tokens | `1h` |
|
||||
| `fs.s3a.assumed.role.session.duration` | Duration of delegation tokens | `1h` |
|
||||
| `fs.s3a.assumed.role.arn` | ARN for role to request | (undefined) |
|
||||
| `fs.s3a.assumed.role.sts.endpoint.region` | region for issued tokens | (undefined) |
|
||||
|
||||
@ -413,7 +412,8 @@ The XML settings needed to enable session tokens are:
|
||||
```
|
||||
|
||||
A JSON role policy for the role/session will automatically be generated which will
|
||||
consist of
|
||||
consist of:
|
||||
|
||||
1. Full access to the S3 bucket for all operations used by the S3A client
|
||||
(read, write, list, multipart operations, get bucket location, etc).
|
||||
1. Full user access to KMS keys. This is to be able to decrypt any data
|
||||
@ -449,7 +449,7 @@ relevant bucket, then a full credential token will be issued.
|
||||
a session delegation token, then the existing token will be forwarded.
|
||||
The life of the token will not be extended.
|
||||
1. If the application requesting a token does not have either of these,
|
||||
the the tokens cannot be issued: the operation will fail with an error.
|
||||
the tokens cannot be issued: the operation will fail with an error.
|
||||
|
||||
## <a name="managing_token_duration"></a> Managing the Delegation Tokens Duration
|
||||
|
||||
@ -465,7 +465,7 @@ that of the role itself: 1h by default, though this can be changed to
|
||||
12h [In the IAM Console](https://console.aws.amazon.com/iam/home#/roles),
|
||||
or from the AWS CLI.
|
||||
|
||||
*Without increasing the duration of role, one hour is the maximum value;
|
||||
Without increasing the duration of the role, one hour is the maximum value;
|
||||
the error message `The requested DurationSeconds exceeds the MaxSessionDuration set for this role`
|
||||
is returned if the requested duration of a Role Delegation Token is greater
|
||||
than that available for the role.
|
||||
@ -545,7 +545,7 @@ Consult [troubleshooting Assumed Roles](assumed_roles.html#troubleshooting)
|
||||
for details on AWS error messages related to AWS IAM roles.
|
||||
|
||||
The [cloudstore](https://github.com/steveloughran/cloudstore) module's StoreDiag
|
||||
utility can also be used to explore delegation token support
|
||||
utility can also be used to explore delegation token support.
|
||||
|
||||
|
||||
### Submitted job cannot authenticate
|
||||
@ -557,7 +557,7 @@ There are many causes for this; delegation tokens add some more.
|
||||
|
||||
* This user is not `kinit`-ed in to Kerberos. Use `klist` and
|
||||
`hadoop kdiag` to see the Kerberos authentication state of the logged in user.
|
||||
* The filesystem instance on the client has not had a token binding set in
|
||||
* The filesystem instance on the client does not have a token binding set in
|
||||
`fs.s3a.delegation.token.binding`, so does not attempt to issue any.
|
||||
* The job submission is not aware that access to the specific S3 buckets
|
||||
are required. Review the application's submission mechanism to determine
|
||||
@ -717,7 +717,7 @@ In the initial results of these tests:
|
||||
|
||||
* A few hundred requests a second can be made before STS block the caller.
|
||||
* The throttling does not last very long (seconds)
|
||||
* Tt does not appear to affect any other STS endpoints.
|
||||
* It does not appear to affect any other STS endpoints.
|
||||
|
||||
If developers wish to experiment with these tests and provide more detailed
|
||||
analysis, we would welcome this. Do bear in mind that all users of the
|
||||
@ -749,7 +749,7 @@ Look at the other examples to see what to do; `SessionTokenIdentifier` does
|
||||
most of the work.
|
||||
|
||||
Having a `toString()` method which is informative is ideal for the `hdfs creds`
|
||||
command as well as debugging: *but do not print secrets*
|
||||
command as well as debugging: *but do not print secrets*.
|
||||
|
||||
*Important*: Add no references to any AWS SDK class, to
|
||||
ensure it can be safely deserialized whenever the relevant token
|
||||
@ -835,13 +835,13 @@ Tests the lifecycle of session tokens.
|
||||
#### Integration Test `ITestSessionDelegationInFileystem`.
|
||||
|
||||
This collects DTs from one filesystem, and uses that to create a new FS instance and
|
||||
then perform filesystem operations. A miniKDC is instantiated
|
||||
then perform filesystem operations. A miniKDC is instantiated.
|
||||
|
||||
* Take care to remove all login secrets from the environment, so as to make sure that
|
||||
the second instance is picking up the DT information.
|
||||
* `UserGroupInformation.reset()` can be used to reset user secrets after every test
|
||||
case (e.g. teardown), so that issued DTs from one test case do not contaminate the next.
|
||||
* its subclass, `ITestRoleDelegationInFileystem` adds a check that the current credentials
|
||||
* It's subclass, `ITestRoleDelegationInFileystem` adds a check that the current credentials
|
||||
in the DT cannot be used to access data on other buckets —that is, the active
|
||||
session really is restricted to the target bucket.
|
||||
|
||||
@ -851,7 +851,7 @@ session really is restricted to the target bucket.
|
||||
It's not easy to bring up a YARN cluster with a secure HDFS and miniKDC controller in
|
||||
test cases —this test, the closest there is to an end-to-end test,
|
||||
uses mocking to mock the RPC calls to the YARN AM, and then verifies that the tokens
|
||||
have been collected in the job context,
|
||||
have been collected in the job context.
|
||||
|
||||
#### Load Test `ILoadTestSessionCredentials`
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user