WebHDFS REST API ================ * [WebHDFS REST API](#WebHDFS_REST_API) * [Document Conventions](#Document_Conventions) * [Introduction](#Introduction) * [Operations](#Operations) * [FileSystem URIs vs HTTP URLs](#FileSystem_URIs_vs_HTTP_URLs) * [HDFS Configuration Options](#HDFS_Configuration_Options) * [Authentication](#Authentication) * [Proxy Users](#Proxy_Users) * [File and Directory Operations](#File_and_Directory_Operations) * [Create and Write to a File](#Create_and_Write_to_a_File) * [Append to a File](#Append_to_a_File) * [Concat File(s)](#Concat_Files) * [Open and Read a File](#Open_and_Read_a_File) * [Make a Directory](#Make_a_Directory) * [Create a Symbolic Link](#Create_a_Symbolic_Link) * [Rename a File/Directory](#Rename_a_FileDirectory) * [Delete a File/Directory](#Delete_a_FileDirectory) * [Truncate a File](#Truncate_a_File) * [Status of a File/Directory](#Status_of_a_FileDirectory) * [List a Directory](#List_a_Directory) * [Other File System Operations](#Other_File_System_Operations) * [Get Content Summary of a Directory](#Get_Content_Summary_of_a_Directory) * [Get File Checksum](#Get_File_Checksum) * [Get Home Directory](#Get_Home_Directory) * [Set Permission](#Set_Permission) * [Set Owner](#Set_Owner) * [Set Replication Factor](#Set_Replication_Factor) * [Set Access or Modification Time](#Set_Access_or_Modification_Time) * [Modify ACL Entries](#Modify_ACL_Entries) * [Remove ACL Entries](#Remove_ACL_Entries) * [Remove Default ACL](#Remove_Default_ACL) * [Remove ACL](#Remove_ACL) * [Set ACL](#Set_ACL) * [Get ACL Status](#Get_ACL_Status) * [Check access](#Check_access) * [Extended Attributes(XAttrs) Operations](#Extended_AttributesXAttrs_Operations) * [Set XAttr](#Set_XAttr) * [Remove XAttr](#Remove_XAttr) * [Get an XAttr](#Get_an_XAttr) * [Get multiple XAttrs](#Get_multiple_XAttrs) * [Get all XAttrs](#Get_all_XAttrs) * [List all XAttrs](#List_all_XAttrs) * [Snapshot Operations](#Snapshot_Operations) * [Create Snapshot](#Create_Snapshot) * [Delete Snapshot](#Delete_Snapshot) * [Rename Snapshot](#Rename_Snapshot) * [Delegation Token Operations](#Delegation_Token_Operations) * [Get Delegation Token](#Get_Delegation_Token) * [Get Delegation Tokens](#Get_Delegation_Tokens) * [Renew Delegation Token](#Renew_Delegation_Token) * [Cancel Delegation Token](#Cancel_Delegation_Token) * [Error Responses](#Error_Responses) * [HTTP Response Codes](#HTTP_Response_Codes) * [Illegal Argument Exception](#Illegal_Argument_Exception) * [Security Exception](#Security_Exception) * [Access Control Exception](#Access_Control_Exception) * [File Not Found Exception](#File_Not_Found_Exception) * [JSON Schemas](#JSON_Schemas) * [ACL Status JSON Schema](#ACL_Status_JSON_Schema) * [XAttrs JSON Schema](#XAttrs_JSON_Schema) * [XAttrNames JSON Schema](#XAttrNames_JSON_Schema) * [Boolean JSON Schema](#Boolean_JSON_Schema) * [ContentSummary JSON Schema](#ContentSummary_JSON_Schema) * [FileChecksum JSON Schema](#FileChecksum_JSON_Schema) * [FileStatus JSON Schema](#FileStatus_JSON_Schema) * [FileStatus Properties](#FileStatus_Properties) * [FileStatuses JSON Schema](#FileStatuses_JSON_Schema) * [Long JSON Schema](#Long_JSON_Schema) * [Path JSON Schema](#Path_JSON_Schema) * [RemoteException JSON Schema](#RemoteException_JSON_Schema) * [Token JSON Schema](#Token_JSON_Schema) * [Token Properties](#Token_Properties) * [Tokens JSON Schema](#Tokens_JSON_Schema) * [HTTP Query Parameter Dictionary](#HTTP_Query_Parameter_Dictionary) * [ACL Spec](#ACL_Spec) * [XAttr Name](#XAttr_Name) * [XAttr Value](#XAttr_Value) * [XAttr set flag](#XAttr_set_flag) * [XAttr value encoding](#XAttr_value_encoding) * [Access Time](#Access_Time) * [Block Size](#Block_Size) * [Buffer Size](#Buffer_Size) * [Create Parent](#Create_Parent) * [Delegation](#Delegation) * [Destination](#Destination) * [Do As](#Do_As) * [Fs Action](#Fs_Action) * [Group](#Group) * [Length](#Length) * [Modification Time](#Modification_Time) * [Offset](#Offset) * [Old Snapshot Name](#Old_Snapshot_Name) * [Op](#Op) * [Overwrite](#Overwrite) * [Owner](#Owner) * [Permission](#Permission) * [Recursive](#Recursive) * [Renewer](#Renewer) * [Replication](#Replication) * [Snapshot Name](#Snapshot_Name) * [Sources](#Sources) * [Token](#Token) * [Token Kind](#Token_Kind) * [Token Service](#Token_Service) * [Username](#Username) Document Conventions -------------------- | `Monospaced` | Used for commands, HTTP request and responses and code blocks. | |:---- |:---- | | `` | User entered values. | | `[Monospaced]` | Optional values. When the value is not specified, the default value is used. | | *Italics* | Important phrases and words. | Introduction ------------ The HTTP REST API supports the complete [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html)/[FileContext](../../api/org/apache/hadoop/fs/FileContext.html) interface for HDFS. The operations and the corresponding FileSystem/FileContext methods are shown in the next section. The Section [HTTP Query Parameter Dictionary](#HTTP_Query_Parameter_Dictionary) specifies the parameter details such as the defaults and the valid values. ### Operations * HTTP GET * [`OPEN`](#Open_and_Read_a_File) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).open) * [`GETFILESTATUS`](#Status_of_a_FileDirectory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getFileStatus) * [`LISTSTATUS`](#List_a_Directory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).listStatus) * [`GETCONTENTSUMMARY`](#Get_Content_Summary_of_a_Directory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getContentSummary) * [`GETFILECHECKSUM`](#Get_File_Checksum) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getFileChecksum) * [`GETHOMEDIRECTORY`](#Get_Home_Directory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getHomeDirectory) * [`GETDELEGATIONTOKEN`](#Get_Delegation_Token) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getDelegationToken) * [`GETDELEGATIONTOKENS`](#Get_Delegation_Tokens) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getDelegationTokens) * [`GETXATTRS`](#Get_an_XAttr) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getXAttr) * [`GETXATTRS`](#Get_multiple_XAttrs) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getXAttrs) * [`GETXATTRS`](#Get_all_XAttrs) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getXAttrs) * [`LISTXATTRS`](#List_all_XAttrs) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).listXAttrs) * [`CHECKACCESS`](#Check_access) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).access) * HTTP PUT * [`CREATE`](#Create_and_Write_to_a_File) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).create) * [`MKDIRS`](#Make_a_Directory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).mkdirs) * [`CREATESYMLINK`](#Create_a_Symbolic_Link) (see [FileContext](../../api/org/apache/hadoop/fs/FileContext.html).createSymlink) * [`RENAME`](#Rename_a_FileDirectory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).rename) * [`SETREPLICATION`](#Set_Replication_Factor) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setReplication) * [`SETOWNER`](#Set_Owner) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setOwner) * [`SETPERMISSION`](#Set_Permission) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setPermission) * [`SETTIMES`](#Set_Access_or_Modification_Time) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setTimes) * [`RENEWDELEGATIONTOKEN`](#Renew_Delegation_Token) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).renewDelegationToken) * [`CANCELDELEGATIONTOKEN`](#Cancel_Delegation_Token) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).cancelDelegationToken) * [`CREATESNAPSHOT`](#Create_Snapshot) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).createSnapshot) * [`RENAMESNAPSHOT`](#Rename_Snapshot) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).renameSnapshot) * [`SETXATTR`](#Set_XAttr) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setXAttr) * [`REMOVEXATTR`](#Remove_XAttr) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).removeXAttr) * HTTP POST * [`APPEND`](#Append_to_a_File) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).append) * [`CONCAT`](#Concat_Files) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).concat) * [`TRUNCATE`](#Truncate_a_File) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).concat) * HTTP DELETE * [`DELETE`](#Delete_a_FileDirectory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).delete) * [`DELETESNAPSHOT`](#Delete_Snapshot) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).deleteSnapshot) ### FileSystem URIs vs HTTP URLs The FileSystem scheme of WebHDFS is "`webhdfs://`". A WebHDFS FileSystem URI has the following format. webhdfs://:/ The above WebHDFS URI corresponds to the below HDFS URI. hdfs://:/ In the REST API, the prefix "`/webhdfs/v1`" is inserted in the path and a query is appended at the end. Therefore, the corresponding HTTP URL has the following format. http://:/webhdfs/v1/?op=... ### HDFS Configuration Options Below are the HDFS configuration options for WebHDFS. | Property Name | Description | |:---- |:---- | | `dfs.webhdfs.enabled ` | Enable/disable WebHDFS in Namenodes and Datanodes | | `dfs.web.authentication.kerberos.principal` | The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. A value of "\*" will use all HTTP principals found in the keytab. | | `dfs.web.authentication.kerberos.keytab ` | The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. | Authentication -------------- When security is *off*, the authenticated user is the username specified in the `user.name` query parameter. If the `user.name` parameter is not set, the server may either set the authenticated user to a default web user, if there is any, or return an error response. When security is *on*, authentication is performed by either Hadoop delegation token or Kerberos SPNEGO. If a token is set in the `delegation` query parameter, the authenticated user is the user encoded in the token. If the `delegation` parameter is not set, the user is authenticated by Kerberos SPNEGO. Below are examples using the `curl` command tool. 1. Authentication when security is off: curl -i "http://:/webhdfs/v1/?[user.name=&]op=..." 2. Authentication using Kerberos SPNEGO when security is on: curl -i --negotiate -u : "http://:/webhdfs/v1/?op=..." 3. Authentication using Hadoop delegation token when security is on: curl -i "http://:/webhdfs/v1/?delegation=&op=..." See also: [Authentication for Hadoop HTTP web-consoles](../hadoop-common/HttpAuthentication.html) Proxy Users ----------- When the proxy user feature is enabled, a proxy user *P* may submit a request on behalf of another user *U*. The username of *U* must be specified in the `doas` query parameter unless a delegation token is presented in authentication. In such case, the information of both users *P* and *U* must be encoded in the delegation token. 1. A proxy request when security is off: curl -i "http://:/webhdfs/v1/?[user.name=&]doas=&op=..." 2. A proxy request using Kerberos SPNEGO when security is on: curl -i --negotiate -u : "http://:/webhdfs/v1/?doas=&op=..." 3. A proxy request using Hadoop delegation token when security is on: curl -i "http://:/webhdfs/v1/?delegation=&op=..." File and Directory Operations ----------------------------- ### Create and Write to a File * Step 1: Submit a HTTP PUT request without automatically following redirects and without sending the file data. curl -i -X PUT "http://:/webhdfs/v1/?op=CREATE [&overwrite=][&blocksize=][&replication=] [&permission=][&buffersize=]" The request is redirected to a datanode where the file data is to be written: HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://:/webhdfs/v1/?op=CREATE... Content-Length: 0 * Step 2: Submit another HTTP PUT request using the URL in the `Location` header with the file data to be written. curl -i -X PUT -T "http://:/webhdfs/v1/?op=CREATE..." The client receives a `201 Created` response with zero content length and the WebHDFS URI of the file in the `Location` header: HTTP/1.1 201 Created Location: webhdfs://:/ Content-Length: 0 **Note** that the reason of having two-step create/append is for preventing clients to send out data before the redirect. This issue is addressed by the "`Expect: 100-continue`" header in HTTP/1.1; see [RFC 2616, Section 8.2.3](http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2.3). Unfortunately, there are software library bugs (e.g. Jetty 6 HTTP server and Java 6 HTTP client), which do not correctly implement "`Expect: 100-continue`". The two-step create/append is a temporary workaround for the software library bugs. See also: [`overwrite`](#Overwrite), [`blocksize`](#Block_Size), [`replication`](#Replication), [`permission`](#Permission), [`buffersize`](#Buffer_Size), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).create ### Append to a File * Step 1: Submit a HTTP POST request without automatically following redirects and without sending the file data. curl -i -X POST "http://:/webhdfs/v1/?op=APPEND[&buffersize=]" The request is redirected to a datanode where the file data is to be appended: HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://:/webhdfs/v1/?op=APPEND... Content-Length: 0 * Step 2: Submit another HTTP POST request using the URL in the `Location` header with the file data to be appended. curl -i -X POST -T "http://:/webhdfs/v1/?op=APPEND..." The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See the note in the previous section for the description of why this operation requires two steps. See also: [`buffersize`](#Buffer_Size), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).append ### Concat File(s) * Submit a HTTP POST request. curl -i -X POST "http://:/webhdfs/v1/?op=CONCAT&sources=" The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See also: [`sources`](#Sources), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).concat ### Open and Read a File * Submit a HTTP GET request with automatically following redirects. curl -i -L "http://:/webhdfs/v1/?op=OPEN [&offset=][&length=][&buffersize=]" The request is redirected to a datanode where the file data can be read: HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://:/webhdfs/v1/?op=OPEN... Content-Length: 0 The client follows the redirect to the datanode and receives the file data: HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 22 Hello, webhdfs user! See also: [`offset`](#Offset), [`length`](#Length), [`buffersize`](#Buffer_Size), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).open ### Make a Directory * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=MKDIRS[&permission=]" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} See also: [`permission`](#Permission), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).mkdirs ### Create a Symbolic Link * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=CREATESYMLINK &destination=[&createParent=]" The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See also: [`destination`](#Destination), [`createParent`](#Create_Parent), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).createSymlink ### Rename a File/Directory * Submit a HTTP PUT request. curl -i -X PUT ":/webhdfs/v1/?op=RENAME&destination=" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} See also: [`destination`](#Destination), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).rename ### Delete a File/Directory * Submit a HTTP DELETE request. curl -i -X DELETE "http://:/webhdfs/v1/?op=DELETE [&recursive=]" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} See also: [`recursive`](#Recursive), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).delete ### Truncate a File * Submit a HTTP POST request. curl -i -X POST "http://:/webhdfs/v1/?op=TRUNCATE&newlength=" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} See also: [`newlength`](#New_Length), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).truncate ### Status of a File/Directory * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=GETFILESTATUS" The client receives a response with a [`FileStatus` JSON object](#FileStatus_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "FileStatus": { "accessTime" : 0, "blockSize" : 0, "group" : "supergroup", "length" : 0, //in bytes, zero for directories "modificationTime": 1320173277227, "owner" : "webuser", "pathSuffix" : "", "permission" : "777", "replication" : 0, "type" : "DIRECTORY" //enum {FILE, DIRECTORY, SYMLINK} } } See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getFileStatus ### List a Directory * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=LISTSTATUS" The client receives a response with a [`FileStatuses` JSON object](#FileStatuses_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Content-Length: 427 { "FileStatuses": { "FileStatus": [ { "accessTime" : 1320171722771, "blockSize" : 33554432, "group" : "supergroup", "length" : 24930, "modificationTime": 1320171722771, "owner" : "webuser", "pathSuffix" : "a.patch", "permission" : "644", "replication" : 1, "type" : "FILE" }, { "accessTime" : 0, "blockSize" : 0, "group" : "supergroup", "length" : 0, "modificationTime": 1320895981256, "owner" : "szetszwo", "pathSuffix" : "bar", "permission" : "711", "replication" : 0, "type" : "DIRECTORY" }, ... ] } } See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).listStatus Other File System Operations ---------------------------- ### Get Content Summary of a Directory * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=GETCONTENTSUMMARY" The client receives a response with a [`ContentSummary` JSON object](#ContentSummary_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "ContentSummary": { "directoryCount": 2, "fileCount" : 1, "length" : 24930, "quota" : -1, "spaceConsumed" : 24930, "spaceQuota" : -1 } } See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getContentSummary ### Get File Checksum * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=GETFILECHECKSUM" The request is redirected to a datanode: HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://:/webhdfs/v1/?op=GETFILECHECKSUM... Content-Length: 0 The client follows the redirect to the datanode and receives a [`FileChecksum` JSON object](#FileChecksum_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "FileChecksum": { "algorithm": "MD5-of-1MD5-of-512CRC32", "bytes" : "eadb10de24aa315748930df6e185c0d ...", "length" : 28 } } See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getFileChecksum ### Get Home Directory * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=GETHOMEDIRECTORY" The client receives a response with a [`Path` JSON object](#Path_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"Path": "/user/szetszwo"} See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getHomeDirectory ### Set Permission * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=SETPERMISSION [&permission=]" The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See also: [`permission`](#Permission), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setPermission ### Set Owner * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=SETOWNER [&owner=][&group=]" The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See also: [`owner`](#Owner), [`group`](#Group), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setOwner ### Set Replication Factor * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=SETREPLICATION [&replication=]" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} See also: [`replication`](#Replication), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setReplication ### Set Access or Modification Time * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=SETTIMES [&modificationtime=