HADOOP-18987. Various fixes to FileSystem API docs (#6292)

Contributed by Dieter De Paepe
This commit is contained in:
DieterDP 2024-02-02 12:49:31 +01:00 committed by GitHub
parent 4f4b846986
commit be13e94843
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 75 additions and 69 deletions

View File

@ -88,14 +88,13 @@ for example. output streams returned by the S3A FileSystem.
The stream MUST implement `Abortable` and `StreamCapabilities`.
```python
if unsupported:
if unsupported:
throw UnsupportedException
if not isOpen(stream):
no-op
StreamCapabilities.hasCapability("fs.capability.outputstream.abortable") == True
```

View File

@ -64,13 +64,13 @@ a protected directory, result in such an exception being raised.
### `boolean isDirectory(Path p)`
def isDirectory(FS, p)= p in directories(FS)
def isDir(FS, p) = p in directories(FS)
### `boolean isFile(Path p)`
def isFile(FS, p) = p in files(FS)
def isFile(FS, p) = p in filenames(FS)
### `FileStatus getFileStatus(Path p)`
@ -250,7 +250,7 @@ process.
changes are made to the filesystem, the result of `listStatus(parent(P))` SHOULD
include the value of `getFileStatus(P)`.
* After an entry at path `P` is created, and before any other
* After an entry at path `P` is deleted, and before any other
changes are made to the filesystem, the result of `listStatus(parent(P))` SHOULD
NOT include the value of `getFileStatus(P)`.
@ -305,7 +305,7 @@ that they must all be listed, and, at the time of listing, exist.
All paths must exist. There is no requirement for uniqueness.
forall p in paths :
exists(fs, p) else raise FileNotFoundException
exists(FS, p) else raise FileNotFoundException
#### Postconditions
@ -381,7 +381,7 @@ being completely performed.
Path `path` must exist:
exists(FS, path) : raise FileNotFoundException
if not exists(FS, path) : raise FileNotFoundException
#### Postconditions
@ -432,7 +432,7 @@ of data which must be collected in a single RPC call.
#### Preconditions
exists(FS, path) else raise FileNotFoundException
if not exists(FS, path) : raise FileNotFoundException
### Postconditions
@ -463,7 +463,7 @@ and 1 for file count.
#### Preconditions
exists(FS, path) else raise FileNotFoundException
if not exists(FS, path) : raise FileNotFoundException
#### Postconditions
@ -596,7 +596,7 @@ on the filesystem.
#### Postconditions
if len(FS, P) > 0: getFileStatus(P).getBlockSize() > 0
if len(FS, P) > 0 : getFileStatus(P).getBlockSize() > 0
result == getFileStatus(P).getBlockSize()
1. The outcome of this operation MUST be identical to the value of
@ -654,12 +654,12 @@ No ancestor may be a file
forall d = ancestors(FS, p) :
if exists(FS, d) and not isDir(FS, d) :
raise [ParentNotDirectoryException, FileAlreadyExistsException, IOException]
raise {ParentNotDirectoryException, FileAlreadyExistsException, IOException}
#### Postconditions
FS' where FS'.Directories' = FS.Directories + [p] + ancestors(FS, p)
FS' where FS'.Directories = FS.Directories + [p] + ancestors(FS, p)
result = True
@ -698,7 +698,7 @@ No ancestor may be a file
forall d = ancestors(FS, p) :
if exists(FS, d) and not isDir(FS, d) :
raise [ParentNotDirectoryException, FileAlreadyExistsException, IOException]
raise {ParentNotDirectoryException, FileAlreadyExistsException, IOException}
FileSystems may reject the request for other
reasons, such as the FS being read-only (HDFS),
@ -712,8 +712,8 @@ For instance, HDFS may raise an `InvalidPathException`.
#### Postconditions
FS' where :
FS'.Files'[p] == []
ancestors(p) is-subset-of FS'.Directories'
FS'.Files[p] == []
ancestors(p) subset-of FS'.Directories
result = FSDataOutputStream
@ -734,7 +734,7 @@ The behavior of the returned stream is covered in [Output](outputstream.html).
clients creating files with `overwrite==true` to fail if the file is created
by another client between the two tests.
* The S3A and potentially other Object Stores connectors not currently change the `FS` state
* The S3A and potentially other Object Stores connectors currently don't change the `FS` state
until the output stream `close()` operation is completed.
This is a significant difference between the behavior of object stores
and that of filesystems, as it allows >1 client to create a file with `overwrite=false`,
@ -762,15 +762,15 @@ The behavior of the returned stream is covered in [Output](outputstream.html).
#### Implementation Notes
`createFile(p)` returns a `FSDataOutputStreamBuilder` only and does not make
change on filesystem immediately. When `build()` is invoked on the `FSDataOutputStreamBuilder`,
changes on the filesystem immediately. When `build()` is invoked on the `FSDataOutputStreamBuilder`,
the builder parameters are verified and [`create(Path p)`](#FileSystem.create)
is invoked on the underlying filesystem. `build()` has the same preconditions
and postconditions as [`create(Path p)`](#FileSystem.create).
* Similar to [`create(Path p)`](#FileSystem.create), files are overwritten
by default, unless specify `builder.overwrite(false)`.
by default, unless specified by `builder.overwrite(false)`.
* Unlike [`create(Path p)`](#FileSystem.create), missing parent directories are
not created by default, unless specify `builder.recursive()`.
not created by default, unless specified by `builder.recursive()`.
### <a name='FileSystem.append'></a> `FSDataOutputStream append(Path p, int bufferSize, Progressable progress)`
@ -780,14 +780,14 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep
if not exists(FS, p) : raise FileNotFoundException
if not isFile(FS, p) : raise [FileAlreadyExistsException, FileNotFoundException, IOException]
if not isFile(FS, p) : raise {FileAlreadyExistsException, FileNotFoundException, IOException}
#### Postconditions
FS' = FS
result = FSDataOutputStream
Return: `FSDataOutputStream`, which can update the entry `FS.Files[p]`
Return: `FSDataOutputStream`, which can update the entry `FS'.Files[p]`
by appending data to the existing list.
The behavior of the returned stream is covered in [Output](outputstream.html).
@ -813,7 +813,7 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep
#### Preconditions
if not isFile(FS, p)) : raise [FileNotFoundException, IOException]
if not isFile(FS, p)) : raise {FileNotFoundException, IOException}
This is a critical precondition. Implementations of some FileSystems (e.g.
Object stores) could shortcut one round trip by postponing their HTTP GET
@ -842,7 +842,7 @@ The result MUST be the same for local and remote callers of the operation.
symbolic links
1. HDFS throws `IOException("Cannot open filename " + src)` if the path
exists in the metadata, but no copies of any its blocks can be located;
exists in the metadata, but no copies of its blocks can be located;
-`FileNotFoundException` would seem more accurate and useful.
### `FSDataInputStreamBuilder openFile(Path path)`
@ -861,7 +861,7 @@ Implementations without a compliant call MUST throw `UnsupportedOperationExcepti
let stat = getFileStatus(Path p)
let FS' where:
(FS.Directories', FS.Files', FS.Symlinks')
(FS'.Directories, FS.Files', FS'.Symlinks)
p' in paths(FS') where:
exists(FS, stat.path) implies exists(FS', p')
@ -931,16 +931,16 @@ metadata in the `PathHandle` to detect references from other namespaces.
### `FSDataInputStream open(PathHandle handle, int bufferSize)`
Implementaions without a compliant call MUST throw `UnsupportedOperationException`
Implementations without a compliant call MUST throw `UnsupportedOperationException`
#### Preconditions
let fd = getPathHandle(FileStatus stat)
if stat.isdir : raise IOException
let FS' where:
(FS.Directories', FS.Files', FS.Symlinks')
p' in FS.Files' where:
FS.Files'[p'] = fd
(FS'.Directories, FS.Files', FS'.Symlinks)
p' in FS'.Files where:
FS'.Files[p'] = fd
if not exists(FS', p') : raise InvalidPathHandleException
The implementation MUST resolve the referent of the `PathHandle` following
@ -951,7 +951,7 @@ encoded in the `PathHandle`.
#### Postconditions
result = FSDataInputStream(0, FS.Files'[p'])
result = FSDataInputStream(0, FS'.Files[p'])
The stream returned is subject to the constraints of a stream returned by
`open(Path)`. Constraints checked on open MAY hold to hold for the stream, but
@ -1006,7 +1006,7 @@ A directory with children and `recursive == False` cannot be deleted
If the file does not exist the filesystem state does not change
if not exists(FS, p):
if not exists(FS, p) :
FS' = FS
result = False
@ -1089,7 +1089,7 @@ Some of the object store based filesystem implementations always return
false when deleting the root, leaving the state of the store unchanged.
if isRoot(p) :
FS ' = FS
FS' = FS
result = False
This is irrespective of the recursive flag status or the state of the directory.
@ -1152,7 +1152,7 @@ has been calculated.
Source `src` must exist:
exists(FS, src) else raise FileNotFoundException
if not exists(FS, src) : raise FileNotFoundException
`dest` cannot be a descendant of `src`:
@ -1162,7 +1162,7 @@ This implicitly covers the special case of `isRoot(FS, src)`.
`dest` must be root, or have a parent that exists:
isRoot(FS, dest) or exists(FS, parent(dest)) else raise IOException
if not (isRoot(FS, dest) or exists(FS, parent(dest))) : raise IOException
The parent path of a destination must not be a file:
@ -1240,7 +1240,8 @@ There is no consistent behavior here.
The outcome is no change to FileSystem state, with a return value of false.
FS' = FS; result = False
FS' = FS
result = False
*Local Filesystem*
@ -1319,15 +1320,18 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep
All sources MUST be in the same directory:
for s in sources: if parent(S) != parent(p) raise IllegalArgumentException
for s in sources:
if parent(s) != parent(p) : raise IllegalArgumentException
All block sizes must match that of the target:
for s in sources: getBlockSize(FS, S) == getBlockSize(FS, p)
for s in sources:
getBlockSize(FS, s) == getBlockSize(FS, p)
No duplicate paths:
not (exists p1, p2 in (sources + [p]) where p1 == p2)
let input = sources + [p]
not (exists i, j: i != j and input[i] == input[j])
HDFS: All source files except the final one MUST be a complete block:
@ -1339,8 +1343,8 @@ HDFS: All source files except the final one MUST be a complete block:
FS' where:
(data(FS', T) = data(FS, T) + data(FS, sources[0]) + ... + data(FS, srcs[length(srcs)-1]))
and for s in srcs: not exists(FS', S)
(data(FS', p) = data(FS, p) + data(FS, sources[0]) + ... + data(FS, sources[length(sources)-1]))
for s in sources: not exists(FS', s)
HDFS's restrictions may be an implementation detail of how it implements
@ -1360,7 +1364,7 @@ Implementations without a compliant call SHOULD throw `UnsupportedOperationExcep
if not exists(FS, p) : raise FileNotFoundException
if isDir(FS, p) : raise [FileNotFoundException, IOException]
if isDir(FS, p) : raise {FileNotFoundException, IOException}
if newLength < 0 || newLength > len(FS.Files[p]) : raise HadoopIllegalArgumentException
@ -1369,8 +1373,7 @@ Truncate cannot be performed on a file, which is open for writing or appending.
#### Postconditions
FS' where:
len(FS.Files[p]) = newLength
len(FS'.Files[p]) = newLength
Return: `true`, if truncation is finished and the file can be immediately
opened for appending, or `false` otherwise.
@ -1399,7 +1402,7 @@ Source and destination must be different
if src = dest : raise FileExistsException
```
Destination and source must not be descendants one another
Destination and source must not be descendants of one another
```python
if isDescendant(src, dest) or isDescendant(dest, src) : raise IOException
```
@ -1429,7 +1432,7 @@ Given a base path on the source `base` and a child path `child` where `base` is
```python
def final_name(base, child, dest):
is base = child:
if base == child:
return dest
else:
return dest + childElements(base, child)
@ -1557,7 +1560,7 @@ while (iterator.hasNext()) {
As raising exceptions is an expensive operation in JVMs, the `while(hasNext())`
loop option is more efficient. (see also [Concurrency and the Remote Iterator](#RemoteIteratorConcurrency)
for a dicussion on this topic).
for a discussion on this topic).
Implementors of the interface MUST support both forms of iterations; authors
of tests SHOULD verify that both iteration mechanisms work.

View File

@ -108,21 +108,21 @@ such as `rename`.
## Defining the Filesystem
A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links
A filesystem `FS` contains directories (a set of paths), files (a mapping of a path to a list of bytes) and symlinks (a set of paths mapping to paths)
(Directories:Set[Path], Files:[Path:List[byte]], Symlinks:Set[Path])
(Directories:Set[Path], Files:Map[Path:List[byte]], Symlinks:Map[Path:Path])
Accessor functions return the specific element of a filesystem
def FS.Directories = FS.Directories
def directories(FS) = FS.Directories
def files(FS) = FS.Files
def symlinks(FS) = FS.Symlinks
def symlinks(FS) = keys(FS.Symlinks)
def filenames(FS) = keys(FS.Files)
The entire set of a paths finite subset of all possible Paths, and functions to resolve a path to data, a directory predicate or a symbolic link:
def paths(FS) = FS.Directories + filenames(FS) + FS.Symlinks)
def paths(FS) = FS.Directories + filenames(FS) + symlinks(FS)
A path is deemed to exist if it is in this aggregate set:
@ -169,10 +169,10 @@ in a set, hence no children with duplicate names.
A path *D* is a descendant of a path *P* if it is the direct child of the
path *P* or an ancestor is a direct child of path *P*:
def isDescendant(P, D) = parent(D) == P where isDescendant(P, parent(D))
def isDescendant(P, D) = parent(D) == P or isDescendant(P, parent(D))
The descendants of a directory P are all paths in the filesystem whose
path begins with the path P -that is their parent is P or an ancestor is P
path begins with the path P, i.e. their parent is P or an ancestor is P
def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)}
@ -181,7 +181,7 @@ path begins with the path P -that is their parent is P or an ancestor is P
A path MAY refer to a file that has data in the filesystem; its path is a key in the data dictionary
def isFile(FS, p) = p in FS.Files
def isFile(FS, p) = p in keys(FS.Files)
### Symbolic references
@ -193,6 +193,10 @@ A path MAY refer to a symbolic link:
### File Length
Files store data:
def data(FS, p) = files(FS)[p]
The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory:
def length(FS, p) = if isFile(p) : return length(data(FS, p)) else return 0
@ -215,9 +219,9 @@ This may differ from the local user account name.
A path cannot refer to more than one of a file, a directory or a symbolic link
FS.Directories ^ keys(data(FS)) == {}
FS.Directories ^ symlinks(FS) == {}
keys(data(FS))(FS) ^ symlinks(FS) == {}
directories(FS) ^ filenames(FS) == {}
directories(FS) ^ symlinks(FS) == {}
filenames(FS) ^ symlinks(FS) == {}
This implies that only files may have data.
@ -248,7 +252,7 @@ For all files in an encrypted zone, the data is encrypted, but the encryption
type and specification are not defined.
forall f in files(FS) where inEncyptionZone(FS, f):
isEncrypted(data(f))
isEncrypted(data(FS, f))
## Notes

View File

@ -80,15 +80,15 @@ are used as the basis for this syntax as it is both plain ASCII and well-known.
##### Lists
* A list *L* is an ordered sequence of elements `[e1, e2, ... en]`
* A list *L* is an ordered sequence of elements `[e1, e2, ... e(n)]`
* The size of a list `len(L)` is the number of elements in a list.
* Items can be addressed by a 0-based index `e1 == L[0]`
* Python slicing operators can address subsets of a list `L[0:3] == [e1,e2]`, `L[:-1] == en`
* Python slicing operators can address subsets of a list `L[0:3] == [e1,e2,e3]`, `L[:-1] == [e1, ... e(n-1)]`
* Lists can be concatenated `L' = L + [ e3 ]`
* Lists can have entries removed `L' = L - [ e2, e1 ]`. This is different from Python's
`del` operation, which operates on the list in place.
* The membership predicate `in` returns true iff an element is a member of a List: `e2 in L`
* List comprehensions can create new lists: `L' = [ x for x in l where x < 5]`
* List comprehensions can create new lists: `L' = [ x for x in L where x < 5]`
* for a list `L`, `len(L)` returns the number of elements.
@ -130,7 +130,7 @@ Strings are lists of characters represented in double quotes. e.g. `"abc"`
All system state declarations are immutable.
The suffix "'" (single quote) is used as the convention to indicate the state of the system after an operation:
The suffix "'" (single quote) is used as the convention to indicate the state of the system after a mutating operation:
L' = L + ['d','e']