HADOOP-18987. Various fixes to FileSystem API docs (#6292)
Contributed by Dieter De Paepe
This commit is contained in:
parent
4f4b846986
commit
be13e94843
@ -88,14 +88,13 @@ for example. output streams returned by the S3A FileSystem.
|
||||
The stream MUST implement `Abortable` and `StreamCapabilities`.
|
||||
|
||||
```python
|
||||
if unsupported:
|
||||
if unsupported:
|
||||
throw UnsupportedException
|
||||
|
||||
if not isOpen(stream):
|
||||
no-op
|
||||
|
||||
StreamCapabilities.hasCapability("fs.capability.outputstream.abortable") == True
|
||||
|
||||
```
|
||||
|
||||
|
||||
|
@ -64,13 +64,13 @@ a protected directory, result in such an exception being raised.
|
||||
|
||||
### `boolean isDirectory(Path p)`
|
||||
|
||||
def isDirectory(FS, p)= p in directories(FS)
|
||||
def isDir(FS, p) = p in directories(FS)
|
||||
|
||||
|
||||
### `boolean isFile(Path p)`
|
||||
|
||||
|
||||
def isFile(FS, p) = p in files(FS)
|
||||
def isFile(FS, p) = p in filenames(FS)
|
||||
|
||||
|
||||
### `FileStatus getFileStatus(Path p)`
|
||||
@ -250,7 +250,7 @@ process.
|
||||
changes are made to the filesystem, the result of `listStatus(parent(P))` SHOULD
|
||||
include the value of `getFileStatus(P)`.
|
||||
|
||||
* After an entry at path `P` is created, and before any other
|
||||
* After an entry at path `P` is deleted, and before any other
|
||||
changes are made to the filesystem, the result of `listStatus(parent(P))` SHOULD
|
||||
NOT include the value of `getFileStatus(P)`.
|
||||
|
||||
@ -305,7 +305,7 @@ that they must all be listed, and, at the time of listing, exist.
|
||||
All paths must exist. There is no requirement for uniqueness.
|
||||
|
||||
forall p in paths :
|
||||
exists(fs, p) else raise FileNotFoundException
|
||||
exists(FS, p) else raise FileNotFoundException
|
||||
|
||||
#### Postconditions
|
||||
|
||||
@ -381,7 +381,7 @@ being completely performed.
|
||||
|
||||
Path `path` must exist:
|
||||
|
||||
exists(FS, path) : raise FileNotFoundException
|
||||
if not exists(FS, path) : raise FileNotFoundException
|
||||
|
||||
#### Postconditions
|
||||
|
||||
@ -432,7 +432,7 @@ of data which must be collected in a single RPC call.
|
||||
|
||||
#### Preconditions
|
||||
|
||||
exists(FS, path) else raise FileNotFoundException
|
||||
if not exists(FS, path) : raise FileNotFoundException
|
||||
|
||||
### Postconditions
|
||||
|
||||
@ -463,7 +463,7 @@ and 1 for file count.
|
||||
|
||||
#### Preconditions
|
||||
|
||||
exists(FS, path) else raise FileNotFoundException
|
||||
if not exists(FS, path) : raise FileNotFoundException
|
||||
|
||||
#### Postconditions
|
||||
|
||||
@ -596,7 +596,7 @@ on the filesystem.
|
||||
|
||||
#### Postconditions
|
||||
|
||||
if len(FS, P) > 0: getFileStatus(P).getBlockSize() > 0
|
||||
if len(FS, P) > 0 : getFileStatus(P).getBlockSize() > 0
|
||||
result == getFileStatus(P).getBlockSize()
|
||||
|
||||
1. The outcome of this operation MUST be identical to the value of
|
||||
@ -654,12 +654,12 @@ No ancestor may be a file
|
||||
|
||||
forall d = ancestors(FS, p) :
|
||||
if exists(FS, d) and not isDir(FS, d) :
|
||||
raise [ParentNotDirectoryException, FileAlreadyExistsException, IOException]
|
||||
raise {ParentNotDirectoryException, FileAlreadyExistsException, IOException}
|
||||
|
||||
#### Postconditions
|
||||
|
||||
|
||||
FS' where FS'.Directories' = FS.Directories + [p] + ancestors(FS, p)
|
||||
FS' where FS'.Directories = FS.Directories + [p] + ancestors(FS, p)
|
||||
result = True
|
||||
|
||||
|
||||
@ -698,7 +698,7 @@ No ancestor may be a file
|
||||
|
||||
forall d = ancestors(FS, p) :
|
||||
if exists(FS, d) and not isDir(FS, d) :
|
||||
raise [ParentNotDirectoryException, FileAlreadyExistsException, IOException]
|
||||
raise {ParentNotDirectoryException, FileAlreadyExistsException, IOException}
|
||||
|
||||
FileSystems may reject the request for other
|
||||
reasons, such as the FS being read-only (HDFS),
|
||||
@ -712,8 +712,8 @@ For instance, HDFS may raise an `InvalidPathException`.
|
||||
#### Postconditions
|
||||
|
||||
FS' where :
|
||||
FS'.Files'[p] == []
|
||||
ancestors(p) is-subset-of FS'.Directories'
|
||||
FS'.Files[p] == []
|
||||
ancestors(p) subset-of FS'.Directories
|
||||
|
||||
result = FSDataOutputStream
|
||||
|
||||
@ -734,7 +734,7 @@ The behavior of the returned stream is covered in [Output](outputstream.html).
|
||||
clients creating files with `overwrite==true` to fail if the file is created
|
||||
by another client between the two tests.
|
||||
|
||||
* The S3A and potentially other Object Stores connectors not currently change the `FS` state
|
||||
* The S3A and potentially other Object Stores connectors currently don't change the `FS` state
|
||||
until the output stream `close()` operation is completed.
|
||||
This is a significant difference between the behavior of object stores
|
||||
and that of filesystems, as it allows >1 client to create a file with `overwrite=false`,
|
||||
@ -762,15 +762,15 @@ The behavior of the returned stream is covered in [Output](outputstream.html).
|
||||
#### Implementation Notes
|
||||
|
||||
`createFile(p)` returns a `FSDataOutputStreamBuilder` only and does not make
|
||||
change on filesystem immediately. When `build()` is invoked on the `FSDataOutputStreamBuilder`,
|
||||
changes on the filesystem immediately. When `build()` is invoked on the `FSDataOutputStreamBuilder`,
|
||||
the builder parameters are verified and [`create(Path p)`](#FileSystem.create)
|
||||
is invoked on the underlying filesystem. `build()` has the same preconditions
|
||||
and postconditions as [`create(Path p)`](#FileSystem.create).
|
||||
|
||||
* Similar to [`create(Path p)`](#FileSystem.create), files are overwritten
|
||||
by default, unless specify `builder.overwrite(false)`.
|
||||
by default, unless specified by `builder.overwrite(false)`.
|
||||
* Unlike [`create(Path p)`](#FileSystem.create), missing parent directories are
|
||||
not created by default, unless specify `builder.recursive()`.
|
||||
not created by default, unless specified by `builder.recursive()`.
|
||||
|
||||
### <a name='FileSystem.append'></a> `FSDataOutputStream append(Path p, int bufferSize, Progressable progress)`
|
||||
|
||||
@ -780,14 +780,14 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep
|
||||
|
||||
if not exists(FS, p) : raise FileNotFoundException
|
||||
|
||||
if not isFile(FS, p) : raise [FileAlreadyExistsException, FileNotFoundException, IOException]
|
||||
if not isFile(FS, p) : raise {FileAlreadyExistsException, FileNotFoundException, IOException}
|
||||
|
||||
#### Postconditions
|
||||
|
||||
FS' = FS
|
||||
result = FSDataOutputStream
|
||||
|
||||
Return: `FSDataOutputStream`, which can update the entry `FS.Files[p]`
|
||||
Return: `FSDataOutputStream`, which can update the entry `FS'.Files[p]`
|
||||
by appending data to the existing list.
|
||||
|
||||
The behavior of the returned stream is covered in [Output](outputstream.html).
|
||||
@ -813,7 +813,7 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep
|
||||
|
||||
#### Preconditions
|
||||
|
||||
if not isFile(FS, p)) : raise [FileNotFoundException, IOException]
|
||||
if not isFile(FS, p)) : raise {FileNotFoundException, IOException}
|
||||
|
||||
This is a critical precondition. Implementations of some FileSystems (e.g.
|
||||
Object stores) could shortcut one round trip by postponing their HTTP GET
|
||||
@ -842,7 +842,7 @@ The result MUST be the same for local and remote callers of the operation.
|
||||
symbolic links
|
||||
|
||||
1. HDFS throws `IOException("Cannot open filename " + src)` if the path
|
||||
exists in the metadata, but no copies of any its blocks can be located;
|
||||
exists in the metadata, but no copies of its blocks can be located;
|
||||
-`FileNotFoundException` would seem more accurate and useful.
|
||||
|
||||
### `FSDataInputStreamBuilder openFile(Path path)`
|
||||
@ -861,7 +861,7 @@ Implementations without a compliant call MUST throw `UnsupportedOperationExcepti
|
||||
|
||||
let stat = getFileStatus(Path p)
|
||||
let FS' where:
|
||||
(FS.Directories', FS.Files', FS.Symlinks')
|
||||
(FS'.Directories, FS.Files', FS'.Symlinks)
|
||||
p' in paths(FS') where:
|
||||
exists(FS, stat.path) implies exists(FS', p')
|
||||
|
||||
@ -931,16 +931,16 @@ metadata in the `PathHandle` to detect references from other namespaces.
|
||||
|
||||
### `FSDataInputStream open(PathHandle handle, int bufferSize)`
|
||||
|
||||
Implementaions without a compliant call MUST throw `UnsupportedOperationException`
|
||||
Implementations without a compliant call MUST throw `UnsupportedOperationException`
|
||||
|
||||
#### Preconditions
|
||||
|
||||
let fd = getPathHandle(FileStatus stat)
|
||||
if stat.isdir : raise IOException
|
||||
let FS' where:
|
||||
(FS.Directories', FS.Files', FS.Symlinks')
|
||||
p' in FS.Files' where:
|
||||
FS.Files'[p'] = fd
|
||||
(FS'.Directories, FS.Files', FS'.Symlinks)
|
||||
p' in FS'.Files where:
|
||||
FS'.Files[p'] = fd
|
||||
if not exists(FS', p') : raise InvalidPathHandleException
|
||||
|
||||
The implementation MUST resolve the referent of the `PathHandle` following
|
||||
@ -951,7 +951,7 @@ encoded in the `PathHandle`.
|
||||
|
||||
#### Postconditions
|
||||
|
||||
result = FSDataInputStream(0, FS.Files'[p'])
|
||||
result = FSDataInputStream(0, FS'.Files[p'])
|
||||
|
||||
The stream returned is subject to the constraints of a stream returned by
|
||||
`open(Path)`. Constraints checked on open MAY hold to hold for the stream, but
|
||||
@ -1006,7 +1006,7 @@ A directory with children and `recursive == False` cannot be deleted
|
||||
|
||||
If the file does not exist the filesystem state does not change
|
||||
|
||||
if not exists(FS, p):
|
||||
if not exists(FS, p) :
|
||||
FS' = FS
|
||||
result = False
|
||||
|
||||
@ -1089,7 +1089,7 @@ Some of the object store based filesystem implementations always return
|
||||
false when deleting the root, leaving the state of the store unchanged.
|
||||
|
||||
if isRoot(p) :
|
||||
FS ' = FS
|
||||
FS' = FS
|
||||
result = False
|
||||
|
||||
This is irrespective of the recursive flag status or the state of the directory.
|
||||
@ -1152,7 +1152,7 @@ has been calculated.
|
||||
|
||||
Source `src` must exist:
|
||||
|
||||
exists(FS, src) else raise FileNotFoundException
|
||||
if not exists(FS, src) : raise FileNotFoundException
|
||||
|
||||
`dest` cannot be a descendant of `src`:
|
||||
|
||||
@ -1162,7 +1162,7 @@ This implicitly covers the special case of `isRoot(FS, src)`.
|
||||
|
||||
`dest` must be root, or have a parent that exists:
|
||||
|
||||
isRoot(FS, dest) or exists(FS, parent(dest)) else raise IOException
|
||||
if not (isRoot(FS, dest) or exists(FS, parent(dest))) : raise IOException
|
||||
|
||||
The parent path of a destination must not be a file:
|
||||
|
||||
@ -1240,7 +1240,8 @@ There is no consistent behavior here.
|
||||
|
||||
The outcome is no change to FileSystem state, with a return value of false.
|
||||
|
||||
FS' = FS; result = False
|
||||
FS' = FS
|
||||
result = False
|
||||
|
||||
*Local Filesystem*
|
||||
|
||||
@ -1319,15 +1320,18 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep
|
||||
|
||||
All sources MUST be in the same directory:
|
||||
|
||||
for s in sources: if parent(S) != parent(p) raise IllegalArgumentException
|
||||
for s in sources:
|
||||
if parent(s) != parent(p) : raise IllegalArgumentException
|
||||
|
||||
All block sizes must match that of the target:
|
||||
|
||||
for s in sources: getBlockSize(FS, S) == getBlockSize(FS, p)
|
||||
for s in sources:
|
||||
getBlockSize(FS, s) == getBlockSize(FS, p)
|
||||
|
||||
No duplicate paths:
|
||||
|
||||
not (exists p1, p2 in (sources + [p]) where p1 == p2)
|
||||
let input = sources + [p]
|
||||
not (exists i, j: i != j and input[i] == input[j])
|
||||
|
||||
HDFS: All source files except the final one MUST be a complete block:
|
||||
|
||||
@ -1339,8 +1343,8 @@ HDFS: All source files except the final one MUST be a complete block:
|
||||
|
||||
|
||||
FS' where:
|
||||
(data(FS', T) = data(FS, T) + data(FS, sources[0]) + ... + data(FS, srcs[length(srcs)-1]))
|
||||
and for s in srcs: not exists(FS', S)
|
||||
(data(FS', p) = data(FS, p) + data(FS, sources[0]) + ... + data(FS, sources[length(sources)-1]))
|
||||
for s in sources: not exists(FS', s)
|
||||
|
||||
|
||||
HDFS's restrictions may be an implementation detail of how it implements
|
||||
@ -1360,7 +1364,7 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep
|
||||
|
||||
if not exists(FS, p) : raise FileNotFoundException
|
||||
|
||||
if isDir(FS, p) : raise [FileNotFoundException, IOException]
|
||||
if isDir(FS, p) : raise {FileNotFoundException, IOException}
|
||||
|
||||
if newLength < 0 || newLength > len(FS.Files[p]) : raise HadoopIllegalArgumentException
|
||||
|
||||
@ -1369,8 +1373,7 @@ Truncate cannot be performed on a file, which is open for writing or appending.
|
||||
|
||||
#### Postconditions
|
||||
|
||||
FS' where:
|
||||
len(FS.Files[p]) = newLength
|
||||
len(FS'.Files[p]) = newLength
|
||||
|
||||
Return: `true`, if truncation is finished and the file can be immediately
|
||||
opened for appending, or `false` otherwise.
|
||||
@ -1399,7 +1402,7 @@ Source and destination must be different
|
||||
if src = dest : raise FileExistsException
|
||||
```
|
||||
|
||||
Destination and source must not be descendants one another
|
||||
Destination and source must not be descendants of one another
|
||||
```python
|
||||
if isDescendant(src, dest) or isDescendant(dest, src) : raise IOException
|
||||
```
|
||||
@ -1429,7 +1432,7 @@ Given a base path on the source `base` and a child path `child` where `base` is
|
||||
|
||||
```python
|
||||
def final_name(base, child, dest):
|
||||
is base = child:
|
||||
if base == child:
|
||||
return dest
|
||||
else:
|
||||
return dest + childElements(base, child)
|
||||
@ -1557,7 +1560,7 @@ while (iterator.hasNext()) {
|
||||
|
||||
As raising exceptions is an expensive operation in JVMs, the `while(hasNext())`
|
||||
loop option is more efficient. (see also [Concurrency and the Remote Iterator](#RemoteIteratorConcurrency)
|
||||
for a dicussion on this topic).
|
||||
for a discussion on this topic).
|
||||
|
||||
Implementors of the interface MUST support both forms of iterations; authors
|
||||
of tests SHOULD verify that both iteration mechanisms work.
|
||||
|
@ -108,21 +108,21 @@ such as `rename`.
|
||||
## Defining the Filesystem
|
||||
|
||||
|
||||
A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links
|
||||
A filesystem `FS` contains directories (a set of paths), files (a mapping of a path to a list of bytes) and symlinks (a set of paths mapping to paths)
|
||||
|
||||
(Directories:Set[Path], Files:[Path:List[byte]], Symlinks:Set[Path])
|
||||
(Directories:Set[Path], Files:Map[Path:List[byte]], Symlinks:Map[Path:Path])
|
||||
|
||||
|
||||
Accessor functions return the specific element of a filesystem
|
||||
|
||||
def FS.Directories = FS.Directories
|
||||
def directories(FS) = FS.Directories
|
||||
def files(FS) = FS.Files
|
||||
def symlinks(FS) = FS.Symlinks
|
||||
def symlinks(FS) = keys(FS.Symlinks)
|
||||
def filenames(FS) = keys(FS.Files)
|
||||
|
||||
The entire set of a paths finite subset of all possible Paths, and functions to resolve a path to data, a directory predicate or a symbolic link:
|
||||
|
||||
def paths(FS) = FS.Directories + filenames(FS) + FS.Symlinks)
|
||||
def paths(FS) = FS.Directories + filenames(FS) + symlinks(FS)
|
||||
|
||||
A path is deemed to exist if it is in this aggregate set:
|
||||
|
||||
@ -169,10 +169,10 @@ in a set, hence no children with duplicate names.
|
||||
A path *D* is a descendant of a path *P* if it is the direct child of the
|
||||
path *P* or an ancestor is a direct child of path *P*:
|
||||
|
||||
def isDescendant(P, D) = parent(D) == P where isDescendant(P, parent(D))
|
||||
def isDescendant(P, D) = parent(D) == P or isDescendant(P, parent(D))
|
||||
|
||||
The descendants of a directory P are all paths in the filesystem whose
|
||||
path begins with the path P -that is their parent is P or an ancestor is P
|
||||
path begins with the path P, i.e. their parent is P or an ancestor is P
|
||||
|
||||
def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)}
|
||||
|
||||
@ -181,7 +181,7 @@ path begins with the path P -that is their parent is P or an ancestor is P
|
||||
|
||||
A path MAY refer to a file that has data in the filesystem; its path is a key in the data dictionary
|
||||
|
||||
def isFile(FS, p) = p in FS.Files
|
||||
def isFile(FS, p) = p in keys(FS.Files)
|
||||
|
||||
|
||||
### Symbolic references
|
||||
@ -193,6 +193,10 @@ A path MAY refer to a symbolic link:
|
||||
|
||||
### File Length
|
||||
|
||||
Files store data:
|
||||
|
||||
def data(FS, p) = files(FS)[p]
|
||||
|
||||
The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory:
|
||||
|
||||
def length(FS, p) = if isFile(p) : return length(data(FS, p)) else return 0
|
||||
@ -215,9 +219,9 @@ This may differ from the local user account name.
|
||||
A path cannot refer to more than one of a file, a directory or a symbolic link
|
||||
|
||||
|
||||
FS.Directories ^ keys(data(FS)) == {}
|
||||
FS.Directories ^ symlinks(FS) == {}
|
||||
keys(data(FS))(FS) ^ symlinks(FS) == {}
|
||||
directories(FS) ^ filenames(FS) == {}
|
||||
directories(FS) ^ symlinks(FS) == {}
|
||||
filenames(FS) ^ symlinks(FS) == {}
|
||||
|
||||
|
||||
This implies that only files may have data.
|
||||
@ -248,7 +252,7 @@ For all files in an encrypted zone, the data is encrypted, but the encryption
|
||||
type and specification are not defined.
|
||||
|
||||
forall f in files(FS) where inEncyptionZone(FS, f):
|
||||
isEncrypted(data(f))
|
||||
isEncrypted(data(FS, f))
|
||||
|
||||
|
||||
## Notes
|
||||
|
@ -80,15 +80,15 @@ are used as the basis for this syntax as it is both plain ASCII and well-known.
|
||||
|
||||
##### Lists
|
||||
|
||||
* A list *L* is an ordered sequence of elements `[e1, e2, ... en]`
|
||||
* A list *L* is an ordered sequence of elements `[e1, e2, ... e(n)]`
|
||||
* The size of a list `len(L)` is the number of elements in a list.
|
||||
* Items can be addressed by a 0-based index `e1 == L[0]`
|
||||
* Python slicing operators can address subsets of a list `L[0:3] == [e1,e2]`, `L[:-1] == en`
|
||||
* Python slicing operators can address subsets of a list `L[0:3] == [e1,e2,e3]`, `L[:-1] == [e1, ... e(n-1)]`
|
||||
* Lists can be concatenated `L' = L + [ e3 ]`
|
||||
* Lists can have entries removed `L' = L - [ e2, e1 ]`. This is different from Python's
|
||||
`del` operation, which operates on the list in place.
|
||||
* The membership predicate `in` returns true iff an element is a member of a List: `e2 in L`
|
||||
* List comprehensions can create new lists: `L' = [ x for x in l where x < 5]`
|
||||
* List comprehensions can create new lists: `L' = [ x for x in L where x < 5]`
|
||||
* for a list `L`, `len(L)` returns the number of elements.
|
||||
|
||||
|
||||
@ -130,7 +130,7 @@ Strings are lists of characters represented in double quotes. e.g. `"abc"`
|
||||
|
||||
All system state declarations are immutable.
|
||||
|
||||
The suffix "'" (single quote) is used as the convention to indicate the state of the system after an operation:
|
||||
The suffix "'" (single quote) is used as the convention to indicate the state of the system after a mutating operation:
|
||||
|
||||
L' = L + ['d','e']
|
||||
|
||||
|
Loading…
Reference in New Issue
Block a user