HADOOP-19178: [WASB Deprecation] Updating Documentation on Upcoming Plans for Hadoop-Azure (#6862)
Contributed by Anuj Modi
This commit is contained in:
parent
2ee0bf9534
commit
bbb17e76a7
@ -18,6 +18,7 @@
|
|||||||
|
|
||||||
See also:
|
See also:
|
||||||
|
|
||||||
|
* [WASB](./wasb.html)
|
||||||
* [ABFS](./abfs.html)
|
* [ABFS](./abfs.html)
|
||||||
* [Testing](./testing_azure.html)
|
* [Testing](./testing_azure.html)
|
||||||
|
|
||||||
|
97
hadoop-tools/hadoop-azure/src/site/markdown/wasb.md
Normal file
97
hadoop-tools/hadoop-azure/src/site/markdown/wasb.md
Normal file
@ -0,0 +1,97 @@
|
|||||||
|
<!---
|
||||||
|
Licensed under the Apache License, Version 2.0 (the "License");
|
||||||
|
you may not use this file except in compliance with the License.
|
||||||
|
You may obtain a copy of the License at
|
||||||
|
|
||||||
|
http://www.apache.org/licenses/LICENSE-2.0
|
||||||
|
|
||||||
|
Unless required by applicable law or agreed to in writing, software
|
||||||
|
distributed under the License is distributed on an "AS IS" BASIS,
|
||||||
|
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||||
|
See the License for the specific language governing permissions and
|
||||||
|
limitations under the License. See accompanying LICENSE file.
|
||||||
|
-->
|
||||||
|
|
||||||
|
# Hadoop Azure Support: WASB Driver
|
||||||
|
|
||||||
|
## Introduction
|
||||||
|
WASB Driver is a legacy Hadoop File System driver that was developed to support
|
||||||
|
[FNS(FlatNameSpace) Azure Storage accounts](https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction)
|
||||||
|
that do not honor File-Folder syntax.
|
||||||
|
HDFS Folder operations hence are mimicked at client side by WASB driver and
|
||||||
|
certain folder operations like Rename and Delete can lead to a lot of IOPs with
|
||||||
|
client-side enumeration and orchestration of rename/delete operation blob by blob.
|
||||||
|
It was not ideal for other APIs too as initial checks for path is a file or folder
|
||||||
|
needs to be done over multiple metadata calls. These led to a degraded performance.
|
||||||
|
|
||||||
|
To provide better service to Analytics users, Microsoft released [ADLS Gen2](https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-introduction)
|
||||||
|
which are HNS (Hierarchical Namespace) enabled, i.e. File-Folder aware storage accounts.
|
||||||
|
ABFS driver was designed to overcome the inherent deficiencies of WASB and users
|
||||||
|
were informed to migrate to ABFS driver.
|
||||||
|
|
||||||
|
### Challenges and limitations of WASB Driver
|
||||||
|
Users of the legacy WASB driver face a number of challenges and limitations:
|
||||||
|
1. They cannot leverage the optimizations and benefits of the latest ABFS driver.
|
||||||
|
2. They need to deal with the compatibility issues should the files and folders were
|
||||||
|
modified with the legacy WASB driver and the ABFS driver concurrently in a phased
|
||||||
|
transition situation.
|
||||||
|
3. There are differences for supported features for FNS and HNS over ABFS Driver.
|
||||||
|
4. In certain cases, they must perform a significant amount of re-work on their
|
||||||
|
workloads to migrate to the ABFS driver, which is available only on HNS enabled
|
||||||
|
accounts in a fully tested and supported scenario.
|
||||||
|
|
||||||
|
## Deprecation plans for WASB Driver
|
||||||
|
We are introducing a new feature that will enable the ABFS driver to support
|
||||||
|
FNS accounts (over BlobEndpoint that WASB Driver uses) using the ABFS scheme.
|
||||||
|
This feature will enable us to use the ABFS driver to interact with data stored in GPv2
|
||||||
|
(General Purpose v2) storage accounts.
|
||||||
|
|
||||||
|
With this feature, the users who still use the legacy WASB driver will be able
|
||||||
|
to migrate to the ABFS driver without much re-work on their workloads. They will
|
||||||
|
however need to change the URIs from the WASB scheme to the ABFS scheme.
|
||||||
|
|
||||||
|
Once ABFS driver has built FNS support capability to migrate WASB users, WASB
|
||||||
|
driver will be marked for removal in next major release. This will remove any ambiguity
|
||||||
|
for new users onboards as there will be only one Microsoft driver for Azure Storage
|
||||||
|
and migrating users will get SLA bound support for driver and service,
|
||||||
|
which was not guaranteed over WASB.
|
||||||
|
|
||||||
|
We anticipate that this feature will serve as a stepping stone for users to
|
||||||
|
move to HNS enabled accounts with the ABFS driver, which is our recommended stack
|
||||||
|
for big data analytics on ADLS Gen2.
|
||||||
|
|
||||||
|
### Impact for existing ABFS users using ADLS Gen2 (HNS enabled account)
|
||||||
|
This feature does not impact the existing users who are using ADLS Gen2 Accounts
|
||||||
|
(HNS enabled account) with ABFS driver.
|
||||||
|
|
||||||
|
They do not need to make any changes to their workloads or configurations. They
|
||||||
|
will still enjoy the benefits of HNS, such as atomic operations, fine-grained
|
||||||
|
access control, scalability, and performance.
|
||||||
|
|
||||||
|
### Official recommendation
|
||||||
|
Microsoft continues to recommend all Big Data and Analytics users to use
|
||||||
|
Azure Data Lake Gen2 (ADLS Gen2) using the ABFS driver and will continue to optimize
|
||||||
|
this scenario in the future, we believe that this new option will help all those
|
||||||
|
users to transition to a supported scenario immediately, while they plan to
|
||||||
|
ultimately move to ADLS Gen2 (HNS enabled account).
|
||||||
|
|
||||||
|
### New Authentication Options for a migrating user
|
||||||
|
Below auth types that WASB provides will continue to work on the new FNS over
|
||||||
|
ABFS Driver over configuration that accepts these SAS types (similar to WASB):
|
||||||
|
1. SharedKey
|
||||||
|
2. Account SAS
|
||||||
|
3. Service/Container SAS
|
||||||
|
|
||||||
|
Below authentication types that were not supported by WASB driver but supported by
|
||||||
|
ABFS driver will continue to be available for new FNS over ABFS Driver
|
||||||
|
1. OAuth 2.0 Client Credentials
|
||||||
|
2. OAuth 2.0: Refresh Token
|
||||||
|
3. Azure Managed Identity
|
||||||
|
4. Custom OAuth 2.0 Token Provider
|
||||||
|
|
||||||
|
Refer to [ABFS Authentication](abfs.html/authentication) for more details.
|
||||||
|
|
||||||
|
### ABFS Features Not Available for migrating Users
|
||||||
|
Certain features of ABFS Driver will be available only to users using HNS accounts with ABFS driver.
|
||||||
|
1. ABFS Driver's SAS Token Provider plugin for UserDelegation SAS and Fixed SAS.
|
||||||
|
2. Client Provided Encryption Key (CPK) support for Data ingress and egress.
|
Loading…
Reference in New Issue
Block a user