From 0a11ce254647312a08557abbbc0fecca026d829c Mon Sep 17 00:00:00 2001 From: Mukund Thakur Date: Mon, 22 Aug 2022 23:19:29 +0530 Subject: [PATCH] HADOOP-18407. Improve readVectored() api spec (#4760) part of HADOOP-18103. Contributed By: Mukund Thakur --- .../java/org/apache/hadoop/fs/PositionedReadable.java | 10 ++++++++++ .../src/site/markdown/filesystem/fsdatainputstream.md | 7 +++++++ 2 files changed, 17 insertions(+) diff --git a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java index de76090512..7380402eb6 100644 --- a/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java +++ b/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/PositionedReadable.java @@ -114,6 +114,16 @@ default int maxReadSizeForVectorReads() { * As a result of the call, each range will have FileRange.setData(CompletableFuture) * called with a future that when complete will have a ByteBuffer with the * data from the file's range. + *

+ * The position returned by getPos() after readVectored() is undefined. + *

+ *

+ * If a file is changed while the readVectored() operation is in progress, the output is + * undefined. Some ranges may have old data, some may have new and some may have both. + *

+ *

+ * While a readVectored() operation is in progress, normal read api calls may block. + *

* @param ranges the byte ranges to read * @param allocate the function to allocate ByteBuffer * @throws IOException any IOE. diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md index 197b999c81..f64a2bd03b 100644 --- a/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md +++ b/hadoop-common-project/hadoop-common/src/site/markdown/filesystem/fsdatainputstream.md @@ -454,6 +454,13 @@ Also, clients are encouraged to use `WeakReferencedElasticByteBufferPool` for allocating buffers such that even direct buffers are garbage collected when they are no longer referenced. +The position returned by `getPos()` after `readVectored()` is undefined. + +If a file is changed while the `readVectored()` operation is in progress, the output is +undefined. Some ranges may have old data, some may have new, and some may have both. + +While a `readVectored()` operation is in progress, normal read api calls may block. + Note: Don't use direct buffers for reading from ChecksumFileSystem as that may lead to memory fragmentation explained in HADOOP-18296.