HADOOP-18407. Improve readVectored() api spec (#4760)

part of HADOOP-18103. Contributed By: Mukund Thakur
2022-08-22 23:19:29 +05:30 · 2022-08-22 23:19:29 +05:30 · 231e095802
commit 231e095802
parent a9e5fb3313
2 changed files with 17 additions and 0 deletions
--- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java
+++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java
@ -114,6 +114,16 @@ default int maxReadSizeForVectorReads() {
   * As a result of the call, each range will have FileRange.setData(CompletableFuture)
   * called with a future that when complete will have a ByteBuffer with the
   * data from the file's range.
+   * <p>
+   *   The position returned by getPos() after readVectored() is undefined.
+   * </p>
+   * <p>
+   *   If a file is changed while the readVectored() operation is in progress, the output is
+   *   undefined. Some ranges may have old data, some may have new and some may have both.
+   * </p>
+   * <p>
+   *   While a readVectored() operation is in progress, normal read api calls may block.
+   * </p>
   * @param ranges the byte ranges to read
   * @param allocate the function to allocate ByteBuffer
   * @throws IOException any IOE.
--- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
+++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md
@ -454,6 +454,13 @@ Also, clients are encouraged to use `WeakReferencedElasticByteBufferPool` for
 allocating buffers such that even direct buffers are garbage collected when
 they are no longer referenced.

+The position returned by `getPos()` after `readVectored()` is undefined.
+
+If a file is changed while the `readVectored()` operation is in progress, the output is
+undefined. Some ranges may have old data, some may have new, and some may have both.
+
+While a `readVectored()` operation is in progress, normal read api calls may block.
+
 Note: Don't use direct buffers for reading from ChecksumFileSystem as that may
 lead to memory fragmentation explained in HADOOP-18296.