WebHDFS REST API ================ Document Conventions -------------------- | `Monospaced` | Used for commands, HTTP request and responses and code blocks. | |:---- |:---- | | `` | User entered values. | | `[Monospaced]` | Optional values. When the value is not specified, the default value is used. | | *Italics* | Important phrases and words. | Introduction ------------ The HTTP REST API supports the complete [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html)/[FileContext](../../api/org/apache/hadoop/fs/FileContext.html) interface for HDFS. The operations and the corresponding FileSystem/FileContext methods are shown in the next section. The Section [HTTP Query Parameter Dictionary](#HTTP_Query_Parameter_Dictionary) specifies the parameter details such as the defaults and the valid values. ### Operations * HTTP GET * [`OPEN`](#Open_and_Read_a_File) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).open) * [`GETFILESTATUS`](#Status_of_a_FileDirectory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getFileStatus) * [`LISTSTATUS`](#List_a_Directory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).listStatus) * [`LISTSTATUS_BATCH`](#Iteratively_List_a_Directory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).listStatusIterator) * [`GETCONTENTSUMMARY`](#Get_Content_Summary_of_a_Directory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getContentSummary) * [`GETFILECHECKSUM`](#Get_File_Checksum) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getFileChecksum) * [`GETHOMEDIRECTORY`](#Get_Home_Directory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getHomeDirectory) * [`GETDELEGATIONTOKEN`](#Get_Delegation_Token) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getDelegationToken) * [`GETTRASHROOT`](#Get_Trash_Root) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getTrashRoot) * [`GETXATTRS`](#Get_an_XAttr) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getXAttr) * [`GETXATTRS`](#Get_multiple_XAttrs) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getXAttrs) * [`GETXATTRS`](#Get_all_XAttrs) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getXAttrs) * [`LISTXATTRS`](#List_all_XAttrs) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).listXAttrs) * [`CHECKACCESS`](#Check_access) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).access) * [`GETALLSTORAGEPOLICY`](#Get_all_Storage_Policies) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getAllStoragePolicies) * [`GETSTORAGEPOLICY`](#Get_Storage_Policy) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getStoragePolicy) * [`GETFILEBLOCKLOCATIONS`](#Get_File_Block_Locations) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getFileBlockLocations) * HTTP PUT * [`CREATE`](#Create_and_Write_to_a_File) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).create) * [`MKDIRS`](#Make_a_Directory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).mkdirs) * [`CREATESYMLINK`](#Create_a_Symbolic_Link) (see [FileContext](../../api/org/apache/hadoop/fs/FileContext.html).createSymlink) * [`RENAME`](#Rename_a_FileDirectory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).rename) * [`SETREPLICATION`](#Set_Replication_Factor) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setReplication) * [`SETOWNER`](#Set_Owner) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setOwner) * [`SETPERMISSION`](#Set_Permission) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setPermission) * [`SETTIMES`](#Set_Access_or_Modification_Time) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setTimes) * [`RENEWDELEGATIONTOKEN`](#Renew_Delegation_Token) (see [DelegationTokenAuthenticator](../../api/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticator.html).renewDelegationToken) * [`CANCELDELEGATIONTOKEN`](#Cancel_Delegation_Token) (see [DelegationTokenAuthenticator](../../api/org/apache/hadoop/security/token/delegation/web/DelegationTokenAuthenticator.html).cancelDelegationToken) * [`CREATESNAPSHOT`](#Create_Snapshot) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).createSnapshot) * [`RENAMESNAPSHOT`](#Rename_Snapshot) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).renameSnapshot) * [`SETXATTR`](#Set_XAttr) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setXAttr) * [`REMOVEXATTR`](#Remove_XAttr) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).removeXAttr) * [`SETSTORAGEPOLICY`](#Set_Storage_Policy) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setStoragePolicy) * HTTP POST * [`APPEND`](#Append_to_a_File) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).append) * [`CONCAT`](#Concat_Files) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).concat) * [`TRUNCATE`](#Truncate_a_File) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).truncate) * [`UNSETSTORAGEPOLICY`](#Unset_Storage_Policy) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).unsetStoragePolicy) * HTTP DELETE * [`DELETE`](#Delete_a_FileDirectory) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).delete) * [`DELETESNAPSHOT`](#Delete_Snapshot) (see [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).deleteSnapshot) ### FileSystem URIs vs HTTP URLs The FileSystem scheme of WebHDFS is "`webhdfs://`". A WebHDFS FileSystem URI has the following format. webhdfs://:/ The above WebHDFS URI corresponds to the below HDFS URI. hdfs://:/ In the REST API, the prefix "`/webhdfs/v1`" is inserted in the path and a query is appended at the end. Therefore, the corresponding HTTP URL has the following format. http://:/webhdfs/v1/?op=... **Note** that if WebHDFS is secured with SSL, then the scheme should be "`swebhdfs://`". swebhdfs://:/ ### HDFS Configuration Options Below are the HDFS configuration options for WebHDFS. | Property Name | Description | |:---- |:---- | | `dfs.web.authentication.kerberos.principal` | The HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. The HTTP Kerberos principal MUST start with 'HTTP/' per Kerberos HTTP SPNEGO specification. A value of "\*" will use all HTTP principals found in the keytab. | | `dfs.web.authentication.kerberos.keytab ` | The Kerberos keytab file with the credentials for the HTTP Kerberos principal used by Hadoop-Auth in the HTTP endpoint. | | `dfs.webhdfs.socket.connect-timeout` | How long to wait for a connection to be established before failing. Specified as a time duration, ie numerical value followed by a units symbol, eg 2m for two minutes. Defaults to 60s. | | `dfs.webhdfs.socket.read-timeout` | How long to wait for data to arrive before failing. Defaults to 60s. | Authentication -------------- When security is *off*, the authenticated user is the username specified in the `user.name` query parameter. If the `user.name` parameter is not set, the server may either set the authenticated user to a default web user, if there is any, or return an error response. When security is *on*, authentication is performed by either Hadoop delegation token or Kerberos SPNEGO. If a token is set in the `delegation` query parameter, the authenticated user is the user encoded in the token. If the `delegation` parameter is not set, the user is authenticated by Kerberos SPNEGO. Below are examples using the `curl` command tool. 1. Authentication when security is off: curl -i "http://:/webhdfs/v1/?[user.name=&]op=..." 2. Authentication using Kerberos SPNEGO when security is on: curl -i --negotiate -u : "http://:/webhdfs/v1/?op=..." 3. Authentication using Hadoop delegation token when security is on: curl -i "http://:/webhdfs/v1/?delegation=&op=..." See also: [Authentication for Hadoop HTTP web-consoles](../hadoop-common/HttpAuthentication.html) Additionally, WebHDFS supports OAuth2 on the client side. The Namenode and Datanodes do not currently support clients using OAuth2 but other backends that implement the WebHDFS REST interface may. WebHDFS supports two type of OAuth2 code grants (user-provided refresh and access token or user provided credential) by default and provides a pluggable mechanism for implementing other OAuth2 authentications per the [OAuth2 RFC](https://tools.ietf.org/html/rfc6749), or custom authentications. When using either of the provided code grant mechanisms, the WebHDFS client will refresh the access token as necessary. OAuth2 should only be enabled for clients not running with Kerberos SPENGO. | OAuth2 code grant mechanism | Description | Value of `dfs.webhdfs.oauth2.access.token.provider` that implements code grant | |:---- |:---- |:----| | Authorization Code Grant | The user provides an initial access token and refresh token, which are then used to authenticate WebHDFS requests and obtain replacement access tokens, respectively. | org.apache.hadoop.hdfs.web.oauth2.ConfRefreshTokenBasedAccessTokenProvider | | Client Credentials Grant | The user provides a credential which is used to obtain access tokens, which are then used to authenticate WebHDFS requests. | org.apache.hadoop.hdfs.web.oauth2.ConfCredentialBasedAccessTokenProvider | The following properties control OAuth2 authentication. | OAuth2 related property | Description | |:---- |:---- | | `dfs.webhdfs.oauth2.enabled` | Boolean to enable/disable OAuth2 authentication | | `dfs.webhdfs.oauth2.access.token.provider` | Class name of an implementation of `org.apache.hadoop.hdfs.web.oauth.AccessTokenProvider.` Two are provided with the code, as described above, or the user may specify a user-provided implementation. The default value for this configuration key is the `ConfCredentialBasedAccessTokenProvider` implementation. | | `dfs.webhdfs.oauth2.client.id` | Client id used to obtain access token with either credential or refresh token | | `dfs.webhdfs.oauth2.refresh.url` | URL against which to post for obtaining bearer token with either credential or refresh token | | `dfs.webhdfs.oauth2.access.token` | (required if using ConfRefreshTokenBasedAccessTokenProvider) Initial access token with which to authenticate | | `dfs.webhdfs.oauth2.refresh.token` | (required if using ConfRefreshTokenBasedAccessTokenProvider) Initial refresh token to use to obtain new access tokens | | `dfs.webhdfs.oauth2.refresh.token.expires.ms.since.epoch` | (required if using ConfRefreshTokenBasedAccessTokenProvider) Access token expiration measured in milliseconds since Jan 1, 1970. *Note this is a different value than provided by OAuth providers and has been munged as described in interface to be suitable for a client application* | | `dfs.webhdfs.oauth2.credential` | (required if using ConfCredentialBasedAccessTokenProvider). Credential used to obtain initial and subsequent access tokens. | Proxy Users ----------- When the proxy user feature is enabled, a proxy user *P* may submit a request on behalf of another user *U*. The username of *U* must be specified in the `doas` query parameter unless a delegation token is presented in authentication. In such case, the information of both users *P* and *U* must be encoded in the delegation token. 1. A proxy request when security is off: curl -i "http://:/webhdfs/v1/?[user.name=&]doas=&op=..." 2. A proxy request using Kerberos SPNEGO when security is on: curl -i --negotiate -u : "http://:/webhdfs/v1/?doas=&op=..." 3. A proxy request using Hadoop delegation token when security is on: curl -i "http://:/webhdfs/v1/?delegation=&op=..." Cross-Site Request Forgery Prevention ------------------------------------- WebHDFS supports an optional, configurable mechanism for cross-site request forgery (CSRF) prevention. When enabled, WebHDFS HTTP requests to the NameNode or DataNode must include a custom HTTP header. Configuration properties allow adjusting which specific HTTP methods are protected and the name of the HTTP header. The value sent in the header is not relevant. Only the presence of a header by that name is required. Enabling CSRF prevention also sets up the `WebHdfsFileSystem` class to send the required header. This ensures that CLI commands like [`hdfs dfs`](./HDFSCommands.html#dfs) and [`hadoop distcp`](../../hadoop-distcp/DistCp.html) continue to work correctly when used with `webhdfs:` URIs. Enabling CSRF prevention also sets up the NameNode web UI to send the required header. After enabling CSRF prevention and restarting the NameNode, existing users of the NameNode web UI need to refresh the browser to reload the page and find the new configuration. The following properties control CSRF prevention. | Property | Description | Default Value | |:---- |:---- |:---- | `dfs.webhdfs.rest-csrf.enabled` | If true, then enables WebHDFS protection against cross-site request forgery (CSRF). The WebHDFS client also uses this property to determine whether or not it needs to send the custom CSRF prevention header in its HTTP requests. | `false` | | `dfs.webhdfs.rest-csrf.custom-header` | The name of a custom header that HTTP requests must send when protection against cross-site request forgery (CSRF) is enabled for WebHDFS by setting dfs.webhdfs.rest-csrf.enabled to true. The WebHDFS client also uses this property to determine whether or not it needs to send the custom CSRF prevention header in its HTTP requests. | `X-XSRF-HEADER` | | `dfs.webhdfs.rest-csrf.methods-to-ignore` | A comma-separated list of HTTP methods that do not require HTTP requests to include a custom header when protection against cross-site request forgery (CSRF) is enabled for WebHDFS by setting dfs.webhdfs.rest-csrf.enabled to true. The WebHDFS client also uses this property to determine whether or not it needs to send the custom CSRF prevention header in its HTTP requests. | `GET,OPTIONS,HEAD,TRACE` | | `dfs.webhdfs.rest-csrf.browser-useragents-regex` | A comma-separated list of regular expressions used to match against an HTTP request's User-Agent header when protection against cross-site request forgery (CSRF) is enabled for WebHDFS by setting dfs.webhdfs.reset-csrf.enabled to true. If the incoming User-Agent matches any of these regular expressions, then the request is considered to be sent by a browser, and therefore CSRF prevention is enforced. If the request's User-Agent does not match any of these regular expressions, then the request is considered to be sent by something other than a browser, such as scripted automation. In this case, CSRF is not a potential attack vector, so the prevention is not enforced. This helps achieve backwards-compatibility with existing automation that has not been updated to send the CSRF prevention header. | `^Mozilla.*,^Opera.*` | The following is an example `curl` call that uses the `-H` option to include the custom header in the request. curl -i -L -X PUT -H 'X-XSRF-HEADER: ""' 'http://:/webhdfs/v1/?op=CREATE' WebHDFS Retry Policy ------------------------------------- WebHDFS supports an optional, configurable retry policy for resilient copy of large files that could timeout, or copy file between HA clusters that could failover during the copy. The following properties control WebHDFS retry and failover policy. | Property | Description | Default Value | |:---- |:---- |:---- | `dfs.http.client.retry.policy.enabled` | If "true", enable the retry policy of WebHDFS client. If "false", retry policy is turned off. | `false` | | `dfs.http.client.retry.policy.spec` | Specify a policy of multiple linear random retry for WebHDFS client, e.g. given pairs of number of retries and sleep time (n0, t0), (n1, t1), ..., the first n0 retries sleep t0 milliseconds on average, the following n1 retries sleep t1 milliseconds on average, and so on. | `10000,6,60000,10` | | `dfs.http.client.failover.max.attempts` | Specify the max number of failover attempts for WebHDFS client in case of network exception. | `15` | | `dfs.http.client.retry.max.attempts` | Specify the max number of retry attempts for WebHDFS client, if the difference between retried attempts and failovered attempts is larger than the max number of retry attempts, there will be no more retries. | `10` | | `dfs.http.client.failover.sleep.base.millis` | Specify the base amount of time in milliseconds upon which the exponentially increased sleep time between retries or failovers is calculated for WebHDFS client. | `500` | | `dfs.http.client.failover.sleep.max.millis` | Specify the upper bound of sleep time in milliseconds between retries or failovers for WebHDFS client. | `15000` | File and Directory Operations ----------------------------- ### Create and Write to a File * Step 1: Submit a HTTP PUT request without automatically following redirects and without sending the file data. curl -i -X PUT "http://:/webhdfs/v1/?op=CREATE [&overwrite=][&blocksize=][&replication=] [&permission=][&buffersize=][&noredirect=]" Usually the request is redirected to a datanode where the file data is to be written. HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://:/webhdfs/v1/?op=CREATE... Content-Length: 0 However, if you do not want to be automatically redirected, you can set the noredirect flag. HTTP/1.1 200 OK Content-Type: application/json {"Location":"http://:/webhdfs/v1/?op=CREATE..."} * Step 2: Submit another HTTP PUT request using the URL in the `Location` header (or the returned response in case you specified noredirect) with the file data to be written. curl -i -X PUT -T "http://:/webhdfs/v1/?op=CREATE..." The client receives a `201 Created` response with zero content length and the WebHDFS URI of the file in the `Location` header: HTTP/1.1 201 Created Location: webhdfs://:/ Content-Length: 0 If no permissions are specified, the newly created file will be assigned with default 644 permission. No umask mode will be applied from server side (so "fs.permissions.umask-mode" value configuration set on Namenode side will have no effect). **Note** that the reason of having two-step create/append is for preventing clients to send out data before the redirect. This issue is addressed by the "`Expect: 100-continue`" header in HTTP/1.1; see [RFC 2616, Section 8.2.3](http://www.w3.org/Protocols/rfc2616/rfc2616-sec8.html#sec8.2.3). Unfortunately, there are software library bugs (e.g. Jetty 6 HTTP server and Java 6 HTTP client), which do not correctly implement "`Expect: 100-continue`". The two-step create/append is a temporary workaround for the software library bugs. See also: [`overwrite`](#Overwrite), [`blocksize`](#Block_Size), [`replication`](#Replication), [`permission`](#Permission), [`buffersize`](#Buffer_Size), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).create ### Append to a File * Step 1: Submit a HTTP POST request without automatically following redirects and without sending the file data. curl -i -X POST "http://:/webhdfs/v1/?op=APPEND[&buffersize=][&noredirect=]" Usually the request is redirected to a datanode where the file data is to be appended: HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://:/webhdfs/v1/?op=APPEND... Content-Length: 0 However, if you do not want to be automatically redirected, you can set the noredirect flag. HTTP/1.1 200 OK Content-Type: application/json {"Location":"http://:/webhdfs/v1/?op=APPEND..."} * Step 2: Submit another HTTP POST request using the URL in the `Location` header (or the returned response in case you specified noredirect) with the file data to be appended. curl -i -X POST -T "http://:/webhdfs/v1/?op=APPEND..." The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See the note in the previous section for the description of why this operation requires two steps. See also: [`buffersize`](#Buffer_Size), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).append ### Concat File(s) * Submit a HTTP POST request. curl -i -X POST "http://:/webhdfs/v1/?op=CONCAT&sources=" The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See also: [`sources`](#Sources), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).concat ### Open and Read a File * Submit a HTTP GET request with automatically following redirects. curl -i -L "http://:/webhdfs/v1/?op=OPEN [&offset=][&length=][&buffersize=][&noredirect=]" Usually the request is redirected to a datanode where the file data can be read: HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://:/webhdfs/v1/?op=OPEN... Content-Length: 0 However if you do not want to be automatically redirected, you can set the noredirect flag. HTTP/1.1 200 OK Content-Type: application/json {"Location":"http://:/webhdfs/v1/?op=OPEN..."} The client follows the redirect to the datanode and receives the file data: HTTP/1.1 200 OK Content-Type: application/octet-stream Content-Length: 22 Hello, webhdfs user! See also: [`offset`](#Offset), [`length`](#Length), [`buffersize`](#Buffer_Size), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).open ### Make a Directory * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=MKDIRS[&permission=]" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} If no permissions are specified, the newly created directory will have 755 permission as default. No umask mode will be applied from server side (so "fs.permissions.umask-mode" value configuration set on Namenode side will have no effect). See also: [`permission`](#Permission), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).mkdirs ### Create a Symbolic Link * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=CREATESYMLINK &destination=[&createParent=]" The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See also: [`destination`](#Destination), [`createParent`](#Create_Parent), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).createSymlink ### Rename a File/Directory * Submit a HTTP PUT request. curl -i -X PUT ":/webhdfs/v1/?op=RENAME&destination=" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} See also: [`destination`](#Destination), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).rename ### Delete a File/Directory * Submit a HTTP DELETE request. curl -i -X DELETE "http://:/webhdfs/v1/?op=DELETE [&recursive=]" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} See also: [`recursive`](#Recursive), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).delete ### Truncate a File * Submit a HTTP POST request. curl -i -X POST "http://:/webhdfs/v1/?op=TRUNCATE&newlength=" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} See also: [`newlength`](#New_Length), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).truncate ### Status of a File/Directory * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=GETFILESTATUS" The client receives a response with a [`FileStatus` JSON object](#FileStatus_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "FileStatus": { "accessTime" : 0, "blockSize" : 0, "group" : "supergroup", "length" : 0, //in bytes, zero for directories "modificationTime": 1320173277227, "owner" : "webuser", "pathSuffix" : "", "permission" : "777", "replication" : 0, "type" : "DIRECTORY" //enum {FILE, DIRECTORY, SYMLINK} } } See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getFileStatus ### List a Directory * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=LISTSTATUS" The client receives a response with a [`FileStatuses` JSON object](#FileStatuses_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Content-Length: 427 { "FileStatuses": { "FileStatus": [ { "accessTime" : 1320171722771, "blockSize" : 33554432, "group" : "supergroup", "length" : 24930, "modificationTime": 1320171722771, "owner" : "webuser", "pathSuffix" : "a.patch", "permission" : "644", "replication" : 1, "type" : "FILE" }, { "accessTime" : 0, "blockSize" : 0, "group" : "supergroup", "length" : 0, "modificationTime": 1320895981256, "owner" : "username", "pathSuffix" : "bar", "permission" : "711", "replication" : 0, "type" : "DIRECTORY" }, ... ] } } See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).listStatus ### Iteratively List a Directory * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=LISTSTATUS_BATCH&startAfter=" The client receives a response with a [`DirectoryListing` JSON object](#DirectoryListing_JSON_Schema), which contains a [`FileStatuses` JSON object](#FileStatuses_JSON_Schema), as well as iteration information: HTTP/1.1 200 OK Cache-Control: no-cache Expires: Thu, 08 Sep 2016 03:40:38 GMT Date: Thu, 08 Sep 2016 03:40:38 GMT Pragma: no-cache Expires: Thu, 08 Sep 2016 03:40:38 GMT Date: Thu, 08 Sep 2016 03:40:38 GMT Pragma: no-cache Content-Type: application/json X-FRAME-OPTIONS: SAMEORIGIN Transfer-Encoding: chunked Server: Jetty(6.1.26) { "DirectoryListing": { "partialListing": { "FileStatuses": { "FileStatus": [ { "accessTime": 0, "blockSize": 0, "childrenNum": 0, "fileId": 16387, "group": "supergroup", "length": 0, "modificationTime": 1473305882563, "owner": "andrew", "pathSuffix": "bardir", "permission": "755", "replication": 0, "storagePolicy": 0, "type": "DIRECTORY" }, { "accessTime": 1473305896945, "blockSize": 1024, "childrenNum": 0, "fileId": 16388, "group": "supergroup", "length": 0, "modificationTime": 1473305896965, "owner": "andrew", "pathSuffix": "bazfile", "permission": "644", "replication": 3, "storagePolicy": 0, "type": "FILE" } ] } }, "remainingEntries": 2 } } If `remainingEntries` is non-zero, there are additional entries in the directory. To query the next batch, set the `startAfter` parameter to the `pathSuffix` of the last item returned in the current batch. For example: curl -i "http://:/webhdfs/v1/?op=LISTSTATUS_BATCH&startAfter=bazfile" Which will return the next batch of directory entries: HTTP/1.1 200 OK Cache-Control: no-cache Expires: Thu, 08 Sep 2016 03:43:20 GMT Date: Thu, 08 Sep 2016 03:43:20 GMT Pragma: no-cache Expires: Thu, 08 Sep 2016 03:43:20 GMT Date: Thu, 08 Sep 2016 03:43:20 GMT Pragma: no-cache Content-Type: application/json X-FRAME-OPTIONS: SAMEORIGIN Transfer-Encoding: chunked Server: Jetty(6.1.26) { "DirectoryListing": { "partialListing": { "FileStatuses": { "FileStatus": [ { "accessTime": 0, "blockSize": 0, "childrenNum": 0, "fileId": 16386, "group": "supergroup", "length": 0, "modificationTime": 1473305878951, "owner": "andrew", "pathSuffix": "foodir", "permission": "755", "replication": 0, "storagePolicy": 0, "type": "DIRECTORY" }, { "accessTime": 1473305902864, "blockSize": 1024, "childrenNum": 0, "fileId": 16389, "group": "supergroup", "length": 0, "modificationTime": 1473305902878, "owner": "andrew", "pathSuffix": "quxfile", "permission": "644", "replication": 3, "storagePolicy": 0, "type": "FILE" } ] } }, "remainingEntries": 0 } } Batch size is controlled by the `dfs.ls.limit` option on the NameNode. See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).listStatusIterator Other File System Operations ---------------------------- ### Get Content Summary of a Directory * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=GETCONTENTSUMMARY" The client receives a response with a [`ContentSummary` JSON object](#ContentSummary_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "ContentSummary": { "directoryCount": 2, "fileCount" : 1, "length" : 24930, "quota" : -1, "spaceConsumed" : 24930, "spaceQuota" : -1, "typeQuota": { "ARCHIVE": { "consumed": 500, "quota": 10000 }, "DISK": { "consumed": 500, "quota": 10000 }, "SSD": { "consumed": 500, "quota": 10000 } } } } See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getContentSummary ### Get File Checksum * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=GETFILECHECKSUM" Usually the request is redirected to a datanode: HTTP/1.1 307 TEMPORARY_REDIRECT Location: http://:/webhdfs/v1/?op=GETFILECHECKSUM... Content-Length: 0 However, if you do not want to be automatically redirected, you can set the noredirect flag. HTTP/1.1 200 OK Content-Type: application/json {"Location":"http://:/webhdfs/v1/?op=GETFILECHECKSUM..."} The client follows the redirect to the datanode and receives a [`FileChecksum` JSON object](#FileChecksum_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked { "FileChecksum": { "algorithm": "MD5-of-1MD5-of-512CRC32", "bytes" : "eadb10de24aa315748930df6e185c0d ...", "length" : 28 } } See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getFileChecksum ### Get Home Directory * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=GETHOMEDIRECTORY" The client receives a response with a [`Path` JSON object](#Path_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"Path": "/user/username"} See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getHomeDirectory ### Get Trash Root * Submit a HTTP GET request. curl -i "http://:/webhdfs/v1/?op=GETTRASHROOT" The client receives a response with a [`Path` JSON object](#Path_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"Path": "/user/username/.Trash"} if the path is an encrypted zone path and user has permission of the path, the client receives a response like this: HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"Path": "/PATH/.Trash/username"} See also: [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).getTrashRoot For more details about trash root in an encrypted zone, please refer to [Transparent Encryption Guide](./TransparentEncryption.html#Rename_and_Trash_considerations). ### Set Permission * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=SETPERMISSION [&permission=]" The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See also: [`permission`](#Permission), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setPermission ### Set Owner * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=SETOWNER [&owner=][&group=]" The client receives a response with zero content length: HTTP/1.1 200 OK Content-Length: 0 See also: [`owner`](#Owner), [`group`](#Group), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setOwner ### Set Replication Factor * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=SETREPLICATION [&replication=]" The client receives a response with a [`boolean` JSON object](#Boolean_JSON_Schema): HTTP/1.1 200 OK Content-Type: application/json Transfer-Encoding: chunked {"boolean": true} See also: [`replication`](#Replication), [FileSystem](../../api/org/apache/hadoop/fs/FileSystem.html).setReplication ### Set Access or Modification Time * Submit a HTTP PUT request. curl -i -X PUT "http://:/webhdfs/v1/?op=SETTIMES [&modificationtime=