HADOOP-13309. Document S3A known limitations in file ownership and permission model. Contributed by Chris Nauroth.
This commit is contained in:
parent
dbd205762e
commit
309a43925c
@ -373,6 +373,21 @@ a time proportional to the quantity of data to upload, and inversely proportiona
|
||||
to the network bandwidth. It may also fail —a failure that is better
|
||||
escalated than ignored.
|
||||
|
||||
1. **Authorization**. Hadoop uses the `FileStatus` class to
|
||||
represent core metadata of files and directories, including the owner, group and
|
||||
permissions. Object stores might not have a viable way to persist this
|
||||
metadata, so they might need to populate `FileStatus` with stub values. Even if
|
||||
the object store persists this metadata, it still might not be feasible for the
|
||||
object store to enforce file authorization in the same way as a traditional file
|
||||
system. If the object store cannot persist this metadata, then the recommended
|
||||
convention is:
|
||||
* File owner is reported as the current user.
|
||||
* File group also is reported as the current user.
|
||||
* Directory permissions are reported as 777.
|
||||
* File permissions are reported as 666.
|
||||
* File system APIs that set ownership and permissions execute successfully
|
||||
without error, but they are no-ops.
|
||||
|
||||
Object stores with these characteristics, can not be used as a direct replacement
|
||||
for HDFS. In terms of this specification, their implementations of the
|
||||
specified operations do not match those required. They are considered supported
|
||||
|
@ -39,7 +39,7 @@ higher performance.
|
||||
|
||||
The specifics of using these filesystems are documented below.
|
||||
|
||||
### Warning #1: Object Stores are not filesystems.
|
||||
### Warning #1: Object Stores are not filesystems
|
||||
|
||||
Amazon S3 is an example of "an object store". In order to achieve scalability
|
||||
and especially high availability, S3 has —as many other cloud object stores have
|
||||
@ -56,14 +56,38 @@ recursive file-by-file operations. They take time at least proportional to
|
||||
the number of files, during which time partial updates may be visible. If
|
||||
the operations are interrupted, the filesystem is left in an intermediate state.
|
||||
|
||||
### Warning #2: Because Object stores don't track modification times of directories,
|
||||
features of Hadoop relying on this can have unexpected behaviour. E.g. the
|
||||
### Warning #2: Object stores don't track modification times of directories
|
||||
|
||||
Features of Hadoop relying on this can have unexpected behaviour. E.g. the
|
||||
AggregatedLogDeletionService of YARN will not remove the appropriate logfiles.
|
||||
|
||||
For further discussion on these topics, please consult
|
||||
[The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html).
|
||||
|
||||
### Warning #3: your AWS credentials are valuable
|
||||
### Warning #3: Object stores have differerent authorization models
|
||||
|
||||
The object authorization model of S3 is much different from the file
|
||||
authorization model of HDFS and traditional file systems. It is not feasible to
|
||||
persist file ownership and permissions in S3, so S3A reports stub information
|
||||
from APIs that would query this metadata:
|
||||
|
||||
* File owner is reported as the current user.
|
||||
* File group also is reported as the current user. Prior to Apache Hadoop
|
||||
2.8.0, file group was reported as empty (no group associated), which is a
|
||||
potential incompatibility problem for scripts that perform positional parsing of
|
||||
shell output and other clients that expect to find a well-defined group.
|
||||
* Directory permissions are reported as 777.
|
||||
* File permissions are reported as 666.
|
||||
|
||||
S3A does not really enforce any authorization checks on these stub permissions.
|
||||
Users authenticate to an S3 bucket using AWS credentials. It's possible that
|
||||
object ACLs have been defined to enforce authorization at the S3 side, but this
|
||||
happens entirely within the S3 service, not within the S3A implementation.
|
||||
|
||||
For further discussion on these topics, please consult
|
||||
[The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html).
|
||||
|
||||
### Warning #4: Your AWS credentials are valuable
|
||||
|
||||
Your AWS credentials not only pay for services, they offer read and write
|
||||
access to the data. Anyone with the credentials can not only read your datasets
|
||||
@ -78,7 +102,7 @@ Do not inadvertently share these credentials through means such as
|
||||
|
||||
If you do any of these: change your credentials immediately!
|
||||
|
||||
### Warning #4: the S3 client provided by Amazon EMR are not from the Apache
|
||||
### Warning #5: The S3 client provided by Amazon EMR are not from the Apache
|
||||
Software foundation, and are only supported by Amazon.
|
||||
|
||||
Specifically: on Amazon EMR, s3a is not supported, and amazon recommend
|
||||
|
Loading…
Reference in New Issue
Block a user