From 7618fa9194b40454405f11a25bec4e2d79506912 Mon Sep 17 00:00:00 2001 From: Daniel Templeton Date: Sat, 16 Sep 2017 09:20:33 +0200 Subject: [PATCH] HADOOP-13714. Tighten up our compatibility guidelines for Hadoop 3 --- .../src/site/markdown/Compatibility.md | 629 +++++++++++++++--- .../site/markdown/InterfaceClassification.md | 217 +++--- 2 files changed, 662 insertions(+), 184 deletions(-) diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md b/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md index 05b18b5929..4fa8c02799 100644 --- a/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md +++ b/hadoop-common-project/hadoop-common/src/site/markdown/Compatibility.md @@ -20,109 +20,276 @@ Apache Hadoop Compatibility Purpose ------- -This document captures the compatibility goals of the Apache Hadoop project. The different types of compatibility between Hadoop releases that affects Hadoop developers, downstream projects, and end-users are enumerated. For each type of compatibility we: +This document captures the compatibility goals of the Apache Hadoop project. +The different types of compatibility between Hadoop releases that affect +Hadoop developers, downstream projects, and end-users are enumerated. For each +type of compatibility this document will: * describe the impact on downstream projects or end-users * where applicable, call out the policy adopted by the Hadoop developers when incompatible changes are permitted. +All Hadoop interfaces are classified according to the intended audience and +stability in order to maintain compatibility with previous releases. See the +[Hadoop Interface Taxonomy](./InterfaceClassification.html) for details +about the classifications. + +### Target Audience + +This document is intended for consumption by the Hadoop developer community. +This document describes the lens through which changes to the Hadoop project +should be viewed. In order for end users and third party developers to have +confidence about cross-release compatibility, the developer community must +ensure that development efforts adhere to these policies. It is the +responsibility of the project committers to validate that all changes either +maintain compatibility or are explicitly marked as incompatible. + +Within a component Hadoop developers are free to use Private and Limited Private +APIs, but when using components from a different module Hadoop developers +should follow the same guidelines as third-party developers: do not +use Private or Limited Private (unless explicitly allowed) interfaces and +prefer instead Stable interfaces to Evolving or Unstable interfaces where +possible. Where not possible, the preferred solution is to expand the audience +of the API rather than introducing or perpetuating an exception to these +compatibility guidelines. When working within a Maven module Hadoop developers +should observe where possible the same level of restraint with regard to +using components located in other Maven modules. + +Above all, Hadoop developers must be mindful of the impact of their changes. +Stable interfaces must not change between major releases. Evolving interfaces +must not change between minor releases. New classes and components must be +labeled appropriately for audience and stability. See the +[Hadoop Interface Taxonomy](./InterfaceClassification.html) for details about +when the various labels are appropriate. As a general rule, all new interfaces +and APIs should have the most limited labels (e.g. Private Unstable) that will +not inhibit the intent of the interface or API. + +### Notational Conventions + +The key words "MUST" "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", +"SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" are to be interpreted as +described in [RFC 2119](http://tools.ietf.org/html/rfc2119). + +Deprecation +----------- + +The Java API provides a @Deprecated annotation to mark an API element as +flagged for removal. The standard meaning of the annotation is that the +API element should not be used and may be removed in a later version. + +In all cases removing an element from an API is an incompatible +change. In the case of [Stable](./InterfaceClassification.html#Stable) APIs, +the change cannot be made between minor releases within the same major +version. In addition, to allow consumers of the API time to adapt to the change, +the API element to be removed should be marked as deprecated for a full major +release before it is removed. For example, if a method is marked as deprecated +in Hadoop 2.8, it cannot be removed until Hadoop 4.0. + +### Policy + +[Stable](./InterfaceClassification.html#Stable) API elements MUST NOT be removed +until they have been marked as deprecated (through the @Deprecated annotation or +other appropriate documentation) for a full major release. In the case that an +API element was introduced as deprecated (to indicate that it is a temporary +measure that is intended to be removed) the API element MAY be removed in the +following major release. When modifying a +[Stable](./InterfaceClassification.html#Stable) API, developers SHOULD prefer +introducing a new method or endpoint and deprecating the existing one to making +incompatible changes to the method or endpoint. + Compatibility types ------------------- ### Java API -Hadoop interfaces and classes are annotated to describe the intended audience and stability in order to maintain compatibility with previous releases. See [Hadoop Interface Classification](./InterfaceClassification.html) for details. +Developers SHOULD annotate all Hadoop interfaces and classes with the +@InterfaceAudience and @InterfaceStability annotations to describe the +intended audience and stability. Annotations may be at the package, class, or +member variable or method level. Member variable and method annotations SHALL +override class annotations, and class annotations SHALL override package +annotations. A package, class, or member variable or method that is not +annotated SHALL be interpreted as implicitly +[Private](./InterfaceClassification.html#Private) and +[Unstable](./InterfaceClassification.html#Unstable). -* InterfaceAudience: captures the intended audience, possible values are Public (for end users and external projects), LimitedPrivate (for other Hadoop components, and closely related projects like YARN, MapReduce, HBase etc.), and Private (for intra component use). -* InterfaceStability: describes what types of interface changes are permitted. Possible values are Stable, Evolving, Unstable, and Deprecated. +* @InterfaceAudience captures the intended audience. Possible values are +[Public](./InterfaceClassification.html#Public) (for end users and external +projects), Limited[Private](./InterfaceClassification.html#Private) (for other +Hadoop components, and closely related projects like YARN, MapReduce, HBase +etc.), and [Private](./InterfaceClassification.html#Private) +(for intra component use). +* @InterfaceStability describes what types of interface changes are permitted. Possible values are [Stable](./InterfaceClassification.html#Stable), [Evolving](./InterfaceClassification.html#Evolving), and [Unstable](./InterfaceClassification.html#Unstable). +* @Deprecated notes that the package, class, or member variable or method could potentially be removed in the future and should not be used. #### Use Cases -* Public-Stable API compatibility is required to ensure end-user programs and downstream projects continue to work without modification. -* LimitedPrivate-Stable API compatibility is required to allow upgrade of individual components across minor releases. -* Private-Stable API compatibility is required for rolling upgrades. +* [Public](./InterfaceClassification.html#Public)-[Stable](./InterfaceClassification.html#Stable) API compatibility is required to ensure end-user programs and downstream projects continue to work without modification. +* [Public](./InterfaceClassification.html#Public)-[Evolving](./InterfaceClassification.html#Evolving) API compatibility is useful to make functionality available for consumption before it is fully baked. +* Limited Private-[Stable](./InterfaceClassification.html#Stable) API compatibility is required to allow upgrade of individual components across minor releases. +* [Private](./InterfaceClassification.html#Private)-[Stable](./InterfaceClassification.html#Stable) API compatibility is required for rolling upgrades. +* [Private](./InterfaceClassification.html#Private)-[Unstable](./InterfaceClassification.html#Unstable) API compatibility allows internal components to evolve rapidly without concern for downstream consumers, and is how most interfaces should be labeled. #### Policy -* Public-Stable APIs must be deprecated for at least one major release prior to their removal in a major release. -* LimitedPrivate-Stable APIs can change across major releases, but not within a major release. -* Private-Stable APIs can change across major releases, but not within a major release. -* Classes not annotated are implicitly "Private". Class members not annotated inherit the annotations of the enclosing class. -* Note: APIs generated from the proto files need to be compatible for rolling-upgrades. See the section on wire-compatibility for more details. The compatibility policies for APIs and wire-communication need to go hand-in-hand to address this. +The compatibility policy SHALL be determined by the relevant package, class, or +member variable or method annotations. -### Semantic compatibility +Note: APIs generated from the proto files MUST be compatible for rolling +upgrades. See the section on wire protocol compatibility for more details. The +compatibility policies for APIs and wire protocols must therefore go hand +in hand. -Apache Hadoop strives to ensure that the behavior of APIs remains consistent over versions, though changes for correctness may result in changes in behavior. Tests and javadocs specify the API's behavior. The community is in the process of specifying some APIs more rigorously, and enhancing test suites to verify compliance with the specification, effectively creating a formal specification for the subset of behaviors that can be easily tested. +#### Semantic compatibility + +Apache Hadoop strives to ensure that the behavior of APIs remains consistent +over versions, though changes for correctness may result in changes in +behavior. API behavior SHALL be specified by the JavaDoc API documentation +where present and complete. When JavaDoc API documentation is not available, +behavior SHALL be specified by the behavior expected by the related unit tests. +In cases with no JavaDoc API documentation or unit test coverage, the expected +behavior is presumed to be obvious and SHOULD be assumed to be the minimum +functionality implied by the interface naming. The community is in the process +of specifying some APIs more rigorously and enhancing test suites to verify +compliance with the specification, effectively creating a formal specification +for the subset of behaviors that can be easily tested. + +The behavior of any API MAY be changed to fix incorrect behavior according to +the stability of the API, with such a change to be accompanied by updating +existing documentation and tests and/or adding new documentation or tests. + +#### Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI + +Apache Hadoop revisions SHOUD retain binary compatability such that end-user +applications continue to work without any modifications. Minor Apache Hadoop +revisions within the same major revision MUST retain compatibility such that +existing MapReduce applications (e.g. end-user applications and projects such +as Apache Pig, Apache Hive, et al), existing YARN applications (e.g. +end-user applications and projects such as Apache Spark, Apache Tez et al), +and applications that accesses HDFS directly (e.g. end-user applications and +projects such as Apache HBase, Apache Flume, et al) work unmodified and without +recompilation when used with any Apache Hadoop cluster within the same major +release as the original build target. + +For MapReduce applications in particular, i.e. applications using the +org.apache.hadoop.mapred and/or org.apache.hadoop.mapreduce APIs, the developer +community SHALL support binary compatibility across major releases. The +MapReduce APIs SHALL be supported compatibly across major releases. See +[Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html) for more details. + +Some applications may be affected by changes to disk layouts or other internal +changes. See the sections that follow for policies on how incompatible +changes to non-API interfaces are handled. + +### Native Dependencies + +Hadoop includes several native components, including compression, the +container executor binary, and various native integrations. These native +components introduce a set of native dependencies for Hadoop, both at compile +time and at runtime, such as cmake, gcc, zlib, etc. This set of native +dependencies is part of the Hadoop ABI. #### Policy -The behavior of API may be changed to fix incorrect behavior, such a change to be accompanied by updating existing buggy tests or adding tests in cases there were none prior to the change. +The minimum required versions of the native components on which Hadoop depends +at compile time and/or runtime SHALL be considered +[Stable](./InterfaceClassification.html#Stable). Changes to the minimum +required versions MUST NOT increase between minor releases within a major +version. -### Wire compatibility +### Wire Protocols -Wire compatibility concerns data being transmitted over the wire between Hadoop processes. Hadoop uses Protocol Buffers for most RPC communication. Preserving compatibility requires prohibiting modification as described below. Non-RPC communication should be considered as well, for example using HTTP to transfer an HDFS image as part of snapshotting or transferring MapTask output. The potential communications can be categorized as follows: +Wire compatibility concerns data being transmitted "over the wire" between +Hadoop processes. Hadoop uses +[Protocol Buffers](https://developers.google.com/protocol-buffers/) for most +RPC communication. Preserving compatibility requires prohibiting modification +as described below. Non-RPC communication should be considered as well, for +example using HTTP to transfer an HDFS image as part of snapshotting or +transferring MapReduce map task output. The communications can be categorized as +follows: * Client-Server: communication between Hadoop clients and servers (e.g., the HDFS client to NameNode protocol, or the YARN client to ResourceManager protocol). -* Client-Server (Admin): It is worth distinguishing a subset of the Client-Server protocols used solely by administrative commands (e.g., the HAAdmin protocol) as these protocols only impact administrators who can tolerate changes that end users (which use general Client-Server protocols) can not. +* Client-Server (Admin): It is worth distinguishing a subset of the Client-Server protocols used solely by administrative commands (e.g., the HAAdmin protocol) as these protocols only impact administrators who can tolerate changes that end users (which use general Client-Server protocols) cannot. * Server-Server: communication between servers (e.g., the protocol between the DataNode and NameNode, or NodeManager and ResourceManager) -#### Use Cases +#### Protocol Dependencies -* Client-Server compatibility is required to allow users to continue using the old clients even after upgrading the server (cluster) to a later version (or vice versa). For example, a Hadoop 2.1.0 client talking to a Hadoop 2.3.0 cluster. -* Client-Server compatibility is also required to allow users to upgrade the client before upgrading the server (cluster). For example, a Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows deployment of client-side bug fixes ahead of full cluster upgrades. Note that new cluster features invoked by new client APIs or shell commands will not be usable. YARN applications that attempt to use new APIs (including new fields in data structures) that have not yet been deployed to the cluster can expect link exceptions. -* Client-Server compatibility is also required to allow upgrading individual components without upgrading others. For example, upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce. -* Server-Server compatibility is required to allow mixed versions within an active cluster so the cluster may be upgraded without downtime in a rolling fashion. +The components of Apache Hadoop may have dependencies that include their own +protocols, such as Zookeeper, S3, Kerberos, etc. These protocol dependencies +SHALL be treated as internal protocols and governed by the same policy. + +#### Transports + +In addition to compatibility of the protocols themselves, maintaining +cross-version communications requires that the transports supported also be +stable. The most likely source of transport changes stems from secure +transports, such as SSL. Upgrading a service from SSLv2 to SSLv3 may break +existing SSLv2 clients. The minimum supported major version of any transports +MUST not increase across minor releases within a major version. + +Service ports are considered as part of the transport mechanism. Fixed +service port numbers MUST be kept consistent to prevent breaking clients. #### Policy -* Both Client-Server and Server-Server compatibility is preserved within a major release. (Different policies for different categories are yet to be considered.) -* Compatibility can be broken only at a major release, though breaking compatibility even at major releases has grave consequences and should be discussed in the Hadoop community. -* Hadoop protocols are defined in .proto (ProtocolBuffers) files. Client-Server protocols and Server-Server protocol .proto files are marked as stable. When a .proto file is marked as stable it means that changes should be made in a compatible fashion as described below: - * The following changes are compatible and are allowed at any time: - * Add an optional field, with the expectation that the code deals with the field missing due to communication with an older version of the code. - * Add a new rpc/method to the service - * Add a new optional request to a Message - * Rename a field - * Rename a .proto file - * Change .proto annotations that effect code generation (e.g. name of java package) - * The following changes are incompatible but can be considered only at a major release - * Change the rpc/method name - * Change the rpc/method parameter type or return type - * Remove an rpc/method - * Change the service name - * Change the name of a Message - * Modify a field type in an incompatible way (as defined recursively) - * Change an optional field to required - * Add or delete a required field - * Delete an optional field as long as the optional field has reasonable defaults to allow deletions - * The following changes are incompatible and hence never allowed - * Change a field id - * Reuse an old field that was previously deleted. - * Field numbers are cheap and changing and reusing is not a good idea. +Hadoop wire protocols are defined in .proto (ProtocolBuffers) files. +Client-Server and Server-Server protocols SHALL be classified according to the +audience and stability classifications noted in their .proto files. In cases +where no classifications are present, the protocols SHOULD be assumed to be +[Private](./InterfaceClassification.html#Private) and +[Stable](./InterfaceClassification.html#Stable). -### Java Binary compatibility for end-user applications i.e. Apache Hadoop ABI +The following changes to a .proto file SHALL be considered compatible: -As Apache Hadoop revisions are upgraded end-users reasonably expect that their applications should continue to work without any modifications. This is fulfilled as a result of supporting API compatibility, Semantic compatibility and Wire compatibility. +* Add an optional field, with the expectation that the code deals with the field missing due to communication with an older version of the code +* Add a new rpc/method to the service +* Add a new optional request to a Message +* Rename a field +* Rename a .proto file +* Change .proto annotations that effect code generation (e.g. name of java package) -However, Apache Hadoop is a very complex, distributed system and services a very wide variety of use-cases. In particular, Apache Hadoop MapReduce is a very, very wide API; in the sense that end-users may make wide-ranging assumptions such as layout of the local disk when their map/reduce tasks are executing, environment variables for their tasks etc. In such cases, it becomes very hard to fully specify, and support, absolute compatibility. +The following changes to a .proto file SHALL be considered incompatible: -#### Use cases +* Change an rpc/method name +* Change an rpc/method parameter type or return type +* Remove an rpc/method +* Change the service name +* Change the name of a Message +* Modify a field type in an incompatible way (as defined recursively) +* Change an optional field to required +* Add or delete a required field +* Delete an optional field as long as the optional field has reasonable defaults to allow deletions -* Existing MapReduce applications, including jars of existing packaged end-user applications and projects such as Apache Pig, Apache Hive, Cascading etc. should work unmodified when pointed to an upgraded Apache Hadoop cluster within a major release. -* Existing YARN applications, including jars of existing packaged end-user applications and projects such as Apache Tez etc. should work unmodified when pointed to an upgraded Apache Hadoop cluster within a major release. -* Existing applications which transfer data in/out of HDFS, including jars of existing packaged end-user applications and frameworks such as Apache Flume, should work unmodified when pointed to an upgraded Apache Hadoop cluster within a major release. +The following changes to a .proto file SHALL be considered incompatible: -#### Policy +* Change a field id +* Reuse an old field that was previously deleted. -* Existing MapReduce, YARN & HDFS applications and frameworks should work unmodified within a major release i.e. Apache Hadoop ABI is supported. -* A very minor fraction of applications maybe affected by changes to disk layouts etc., the developer community will strive to minimize these changes and will not make them within a minor version. In more egregious cases, we will consider strongly reverting these breaking changes and invalidating offending releases if necessary. -* In particular for MapReduce applications, the developer community will try our best to support providing binary compatibility across major releases e.g. applications using org.apache.hadoop.mapred. -* APIs are supported compatibly across hadoop-1.x and hadoop-2.x. See [Compatibility for MapReduce applications between hadoop-1.x and hadoop-2.x](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduce_Compatibility_Hadoop1_Hadoop2.html) for more details. +Hadoop wire protocols that are not defined via .proto files SHOULD be considered +to be [Private](./InterfaceClassification.html#Private) and +[Stable](./InterfaceClassification.html#Stable). + +In addition to the limitations imposed by being +[Stable](./InterfaceClassification.html#Stable), Hadoop's wire protocols +MUST also be forward compatible across minor releases within a major version +according to the following: + +* Client-Server compatibility MUST be maintained so as to allow users to continue using older clients even after upgrading the server (cluster) to a later version (or vice versa). For example, a Hadoop 2.1.0 client talking to a Hadoop 2.3.0 cluster. +* Client-Server compatibility MUST be maintained so as to allow users to upgrade the client before upgrading the server (cluster). For example, a Hadoop 2.4.0 client talking to a Hadoop 2.3.0 cluster. This allows deployment of client-side bug fixes ahead of full cluster upgrades. Note that new cluster features invoked by new client APIs or shell commands will not be usable. YARN applications that attempt to use new APIs (including new fields in data structures) that have not yet been deployed to the cluster can expect link exceptions. +* Client-Server compatibility MUST be maintained so as to allow upgrading individual components without upgrading others. For example, upgrade HDFS from version 2.1.0 to 2.2.0 without upgrading MapReduce. +* Server-Server compatibility MUST be maintained so as to allow mixed versions within an active cluster so the cluster may be upgraded without downtime in a rolling fashion. + +New transport mechanisms MUST only be introduced with minor or major version +changes. Existing transport mechanisms MUST continue to be supported across +minor versions within a major version. Service port numbers MUST remain +consistent across minor version numbers within a major version. ### REST APIs -REST API compatibility corresponds to both the requests (URLs) and responses to each request (content, which may contain other URLs). Hadoop REST APIs are specifically meant for stable use by clients across releases, even major ones. The following are the exposed REST APIs: +REST API compatibility applies to the REST endpoints (URLs) and response data +format. Hadoop REST APIs are specifically meant for stable use by clients across +releases, even major ones. The following is a non-exhaustive list of the +exposed REST APIs: -* [WebHDFS](../hadoop-hdfs/WebHDFS.html) - Stable +* [WebHDFS](../hadoop-hdfs/WebHDFS.html) * [ResourceManager](../../hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html) * [NodeManager](../../hadoop-yarn/hadoop-yarn-site/NodeManagerRest.html) * [MR Application Master](../../hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapredAppMasterRest.html) @@ -130,134 +297,390 @@ REST API compatibility corresponds to both the requests (URLs) and responses to * [Timeline Server v1 REST API](../../hadoop-yarn/hadoop-yarn-site/TimelineServer.html) * [Timeline Service v2 REST API](../../hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html) +Each API has an API-specific version number. Any incompatible changes MUST +increment the API version number. + #### Policy -The APIs annotated stable in the text above preserve compatibility across at least one major release, and maybe deprecated by a newer version of the REST API in a major release. +The Hadoop REST APIs SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Evolving](./InterfaceClassification.html#Evolving). With respect to API version +numbers, the Hadoop REST APIs SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Stable](./InterfaceClassification.html#Stable), i.e. no incompatible changes +are allowed to within an API version number. + +### Log Output + +The Hadoop daemons and CLIs produce log output via Log4j that is intended to +aid administrators and developers in understanding and troubleshooting cluster +behavior. Log messages are intended for human consumption, though automation +use cases are also supported. + +#### Policy + +All log output SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Evolving](./InterfaceClassification.html#Evolving). + +### Audit Log Output + +Several components have audit logging systems that record system information in +a machine readable format. Incompatible changes to that data format may break +existing automation utilities. For the audit log, an incompatible change is any +change that changes the format such that existing parsers no longer can parse +the logs. + +#### Policy + +All audit log output SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Stable](./InterfaceClassification.html#Stable). Any change to the +data format SHALL be considered an incompatible change. ### Metrics/JMX -While the Metrics API compatibility is governed by Java API compatibility, the actual metrics exposed by Hadoop need to be compatible for users to be able to automate using them (scripts etc.). Adding additional metrics is compatible. Modifying (e.g. changing the unit or measurement) or removing existing metrics breaks compatibility. Similarly, changes to JMX MBean object names also break compatibility. +While the Metrics API compatibility is governed by Java API compatibility, the +Metrics data format exposed by Hadoop MUST be maintained as compatible for +consumers of the data, e.g. for automation tasks. #### Policy -Metrics should preserve compatibility within the major release. +The data format exposed via Metrics SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Stable](./InterfaceClassification.html#Stable). ### File formats & Metadata -User and system level data (including metadata) is stored in files of different formats. Changes to the metadata or the file formats used to store data/metadata can lead to incompatibilities between versions. +User and system level data (including metadata) is stored in files of various +formats. Changes to the metadata or the file formats used to store +data/metadata can lead to incompatibilities between versions. Each class of file +format is addressed below. #### User-level file formats -Changes to formats that end-users use to store their data can prevent them from accessing the data in later releases, and hence it is highly important to keep those file-formats compatible. One can always add a "new" format improving upon an existing format. Examples of these formats include har, war, SequenceFileFormat etc. +Changes to formats that end users use to store their data can prevent them from +accessing the data in later releases, and hence are important to be compatible. +Examples of these formats include har, war, SequenceFileFormat, etc. ##### Policy -* Non-forward-compatible user-file format changes are restricted to major releases. When user-file formats change, new releases are expected to read existing formats, but may write data in formats incompatible with prior releases. Also, the community shall prefer to create a new format that programs must opt in to instead of making incompatible changes to existing formats. +User-level file formats SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Stable](./InterfaceClassification.html#Stable). User-lever file +format changes SHOULD be made forward compatible across major releases and MUST +be made forward compatible within a major release. The developer community +SHOULD prefer the creation of a new derivative file format to making +incompatible changes to an existing file format. Such new file formats MUST be +created as opt-in, meaning that users must be able to continue using the +existing compatible format until and unless they explicitly opt in to using +the new file format. -#### System-internal file formats +#### System-internal data schemas -Hadoop internal data is also stored in files and again changing these formats can lead to incompatibilities. While such changes are not as devastating as the user-level file formats, a policy on when the compatibility can be broken is important. +Hadoop internal data may also be stored in files or other data stores. Changing +the schemas of these data stores can lead to incompatibilities. ##### MapReduce MapReduce uses formats like I-File to store MapReduce-specific data. -##### Policy +###### Policy -MapReduce-internal formats like IFile maintain compatibility within a major release. Changes to these formats can cause in-flight jobs to fail and hence we should ensure newer clients can fetch shuffle-data from old servers in a compatible manner. +All MapReduce-internal file formats, such as I-File format or the job history +server's jhist file format, SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Stable](./InterfaceClassification.html#Stable). ##### HDFS Metadata -HDFS persists metadata (the image and edit logs) in a particular format. Incompatible changes to either the format or the metadata prevent subsequent releases from reading older metadata. Such incompatible changes might require an HDFS "upgrade" to convert the metadata to make it accessible. Some changes can require more than one such "upgrades". +HDFS persists metadata (the image and edit logs) in a private file format. +Incompatible changes to either the format or the metadata prevent subsequent +releases from reading older metadata. Incompatible changes MUST include a +process by which existing metadata may be upgraded. Changes SHALL be +allowed to require more than one upgrade. Incompatible changes MUST result in +the metadata version number being incremented. -Depending on the degree of incompatibility in the changes, the following potential scenarios can arise: +Depending on the degree of incompatibility in the changes, the following +potential scenarios can arise: * Automatic: The image upgrades automatically, no need for an explicit "upgrade". * Direct: The image is upgradable, but might require one explicit release "upgrade". * Indirect: The image is upgradable, but might require upgrading to intermediate release(s) first. * Not upgradeable: The image is not upgradeable. -##### Policy +HDFS data nodes store data in a private directory structure. The schema of that +directory structure must remain stable to retain compatibility. -* A release upgrade must allow a cluster to roll-back to the older version and its older disk format. The rollback needs to restore the original data, but not required to restore the updated data. -* HDFS metadata changes must be upgradeable via any of the upgrade paths - automatic, direct or indirect. -* More detailed policies based on the kind of upgrade are yet to be considered. +###### Policy + +The HDFS metadata format SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Evolving](./InterfaceClassification.html#Evolving). Incompatible +changes MUST include a process by which existing metada may be upgraded. The +upgrade process MUST allow the cluster metadata to be rolled back to the older +version and its older disk format. The rollback MUST restore the original data +but is not REQUIRED to restore the updated data. Any incompatible change +to the format MUST result in the major version number of the schema being +incremented. + +The data node directory format SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Evolving](./InterfaceClassification.html#Evolving). + +##### AWS S3A Guard Metadata + +For each operation in the Hadoop S3 client (s3a) that reads or modifies +file metadata, a shadow copy of that file metadata is stored in a separate +metadata store, which offers HDFS-like consistency for the metadata, and may +also provide faster lookups for things like file status or directory listings. +S3A guard tables are created with a version marker which indicates +compatibility. + +###### Policy + +The S3A guard metadata schema SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Unstable](./InterfaceClassification.html#Unstable). Any incompatible change +to the schema MUST result in the version number of the schema being incremented. + +##### YARN Resource Manager State Store + +The YARN resource manager stores information about the cluster state in an +external state store for use in fail over and recovery. If the schema used for +the state store data does not remain compatible, the resource manager will not +be able to recover its state and will fail to start. The state store data +schema includes a version number that indicates compatibility. + +###### Policy + +The YARN resource manager state store data schema SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Evolving](./InterfaceClassification.html#Evolving). Any incompatible change +to the schema MUST result in the major version number of the schema being +incremented. Any compatible change to the schema MUST result in the minor +version number being incremented. + +##### YARN Node Manager State Store + +The YARN node manager stores information about the node state in an +external state store for use in recovery. If the schema used for the state +store data does not remain compatible, the node manager will not +be able to recover its state and will fail to start. The state store data +schema includes a version number that indicates compatibility. + +###### Policy + +The YARN node manager state store data schema SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Evolving](./InterfaceClassification.html#Evolving). Any incompatible change +to the schema MUST result in the major version number of the schema being +incremented. Any compatible change to the schema MUST result in the minor +version number being incremented. + +##### YARN Federation State Store + +The YARN resource manager federation service stores information about the +federated clusters, running applications, and routing policies in an +external state store for use in replication and recovery. If the schema used +for the state store data does not remain compatible, the federation service +will fail to initialize. The state store data schema includes a version number +that indicates compatibility. + +###### Policy + +The YARN federation service state store data schema SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Evolving](./InterfaceClassification.html#Evolving). Any incompatible change +to the schema MUST result in the major version number of the schema being +incremented. Any compatible change to the schema MUST result in the minor +version number being incremented. ### Command Line Interface (CLI) -The Hadoop command line programs may be used either directly via the system shell or via shell scripts. Changing the path of a command, removing or renaming command line options, the order of arguments, or the command return code and output break compatibility and may adversely affect users. +The Hadoop command line programs may be used either directly via the system +shell or via shell scripts. The CLIs include both the user-facing commands, such +as the hdfs command or the yarn command, and the admin-facing commands, such as +the scripts used to start and stop daemons. Changing the path of a command, +removing or renaming command line options, the order of arguments, or the +command return codes and output break compatibility and adversely affect users. #### Policy -CLI commands are to be deprecated (warning when used) for one major release before they are removed or incompatibly modified in a subsequent major release. +All Hadoop CLI paths, usage, and output SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Stable](./InterfaceClassification.html#Stable). +Note that the CLI output SHALL be considered distinct from the log output +generated by the Hadoop CLIs. The latter SHALL be governed by the policy on log +output. Note also that for CLI output, all changes SHALL be considered +incompatible changes. ### Web UI -Web UI, particularly the content and layout of web pages, changes could potentially interfere with attempts to screen scrape the web pages for information. +Web UI, particularly the content and layout of web pages, changes could +potentially interfere with attempts to screen scrape the web pages for +information. The Hadoop Web UI pages, however, are not meant to be scraped, e.g. +for automation purposes. Users are expected to use REST APIs to programmatically +access cluster information. #### Policy -Web pages are not meant to be scraped and hence incompatible changes to them are allowed at any time. Users are expected to use REST APIs to get any information. +The Hadoop Web UI SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Unstable](./InterfaceClassification.html#Unstable). ### Hadoop Configuration Files -Users use (1) Hadoop-defined properties to configure and provide hints to Hadoop and (2) custom properties to pass information to jobs. Hence, compatibility of config properties is two-fold: - -* Modifying key-names, units of values, and default values of Hadoop-defined properties. -* Custom configuration property keys should not conflict with the namespace of Hadoop-defined properties. Typically, users should avoid using prefixes used by Hadoop: hadoop, io, ipc, fs, net, file, ftp, s3, kfs, ha, file, dfs, mapred, mapreduce, yarn. +Users use Hadoop-defined properties to configure and provide hints to Hadoop and +custom properties to pass information to jobs. Users are encouraged to avoid +using custom configuration property names that conflict with the namespace of +Hadoop-defined properties and should avoid using any prefixes used by Hadoop, +e.g. hadoop, io, ipc, fs, net, file, ftp, s3, kfs, ha, file, dfs, mapred, +mapreduce, and yarn. #### Policy -* Hadoop-defined properties are to be deprecated at least for one major release before being removed. Modifying units for existing properties is not allowed. -* The default values of Hadoop-defined properties can be changed across minor/major releases, but will remain the same across point releases within a minor release. -* Currently, there is NO explicit policy regarding when new prefixes can be added/removed, and the list of prefixes to be avoided for custom configuration properties. However, as noted above, users should avoid using prefixes used by Hadoop: hadoop, io, ipc, fs, net, file, ftp, s3, kfs, ha, file, dfs, mapred, mapreduce, yarn. +Hadoop-defined properties (names and meanings) SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Stable](./InterfaceClassification.html#Stable). The units implied by a +Hadoop-defined property MUST NOT change, even +across major versions. Default values of Hadoop-defined properties SHALL be +considered [Public](./InterfaceClassification.html#Public) and +[Evolving](./InterfaceClassification.html#Evolving). + +### Log4j Configuration Files + +The log output produced by Hadoop daemons and CLIs is governed by a set of +configuration files. These files control the minimum level of log message that +will be output by the various components of Hadoop, as well as where and how +those messages are stored. + +#### Policy + +All Log4j configurations SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Evolving](./InterfaceClassification.html#Evolving). ### Directory Structure -Source code, artifacts (source and tests), user logs, configuration files, output and job history are all stored on disk either local file system or HDFS. Changing the directory structure of these user-accessible files break compatibility, even in cases where the original path is preserved via symbolic links (if, for example, the path is accessed by a servlet that is configured to not follow symbolic links). +Source code, artifacts (source and tests), user logs, configuration files, +output, and job history are all stored on disk either local file system or HDFS. +Changing the directory structure of these user-accessible files can break +compatibility, even in cases where the original path is preserved via symbolic +links (such as when the path is accessed by a servlet that is configured to +not follow symbolic links). #### Policy -* The layout of source code and build artifacts can change anytime, particularly so across major versions. Within a major version, the developers will attempt (no guarantees) to preserve the directory structure; however, individual files can be added/moved/deleted. The best way to ensure patches stay in sync with the code is to get them committed to the Apache source tree. -* The directory structure of configuration files, user logs, and job history will be preserved across minor and point releases within a major release. +The layout of source code and build artifacts SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Unstable](./InterfaceClassification.html#Unstable). Within a major version, +the developer community SHOULD preserve the +overall directory structure, though individual files MAY be added, moved, or +deleted with no warning. + +The directory structure of configuration files, user logs, and job history SHALL +be considered [Public](./InterfaceClassification.html#Public) and +[Evolving](./InterfaceClassification.html#Evolving). ### Java Classpath -User applications built against Hadoop might add all Hadoop jars (including Hadoop's library dependencies) to the application's classpath. Adding new dependencies or updating the version of existing dependencies may interfere with those in applications' classpaths. +Hadoop provides several client artifacts that applications use to interact +with the system. These artifacts typically have their own dependencies on +common libraries. In the cases where these dependencies are exposed to +end user applications or downstream consumers (i.e. not +[shaded](https://stackoverflow.com/questions/13620281/what-is-the-maven-shade-plugin-used-for-and-why-would-you-want-to-relocate-java)) +changes to these dependencies can be disruptive. Developers are strongly +encouraged to avoid exposing dependencies to clients by using techniques +such as +[shading](https://stackoverflow.com/questions/13620281/what-is-the-maven-shade-plugin-used-for-and-why-would-you-want-to-relocate-java). + +With regard to dependencies, adding a dependency is an incompatible change, +whereas removing a dependency is a compatible change. + +Some user applications built against Hadoop may add all Hadoop JAR files +(including Hadoop's library dependencies) to the application's classpath. +Adding new dependencies or updating the versions of existing dependencies may +interfere with those in applications' classpaths and hence their correct +operation. Users are therefore discouraged from adopting this practice. #### Policy -Currently, there is NO policy on when Hadoop's dependencies can change. +The set of dependencies exposed by the Hadoop client artifacts SHALL be +considered [Public](./InterfaceClassification.html#Public) and +[Stable](./InterfaceClassification.html#Stable). Any dependencies that are not +exposed to clients (either because they are shaded or only exist in non-client +artifacts) SHALL be considered [Private](./InterfaceClassification.html#Private) +and [Unstable](./InterfaceClassification.html#Unstable) ### Environment variables -Users and related projects often utilize the exported environment variables (eg HADOOP\_CONF\_DIR), therefore removing or renaming environment variables is an incompatible change. +Users and related projects often utilize the environment variables exported by +Hadoop (e.g. HADOOP\_CONF\_DIR). Removing or renaming environment variables can +therefore impact end user applications. #### Policy -Currently, there is NO policy on when the environment variables can change. Developers try to limit changes to major releases. +The environment variables consumed by Hadoop and the environment variables made +accessible to applications through YARN SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Evolving](./InterfaceClassification.html#Evolving). +The developer community SHOULD limit changes to major releases. ### Build artifacts -Hadoop uses maven for project management and changing the artifacts can affect existing user workflows. +Hadoop uses Maven for project management. Changes to the contents of +generated artifacts can impact existing user applications. #### Policy -* Test artifacts: The test jars generated are strictly for internal use and are not expected to be used outside of Hadoop, similar to APIs annotated @Private, @Unstable. -* Built artifacts: The hadoop-client artifact (maven groupId:artifactId) stays compatible within a major release, while the other artifacts can change in incompatible ways. +The contents of Hadoop test artifacts SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Unstable](./InterfaceClassification.html#Unstable). Test artifacts include +all JAR files generated from test source code and all JAR files that include +"tests" in the file name. + +The Hadoop client artifacts SHALL be considered +[Public](./InterfaceClassification.html#Public) and +[Stable](./InterfaceClassification.html#Stable). Client artifacts are the +following: + +* hadoop-client +* hadoop-client-api +* hadoop-client-minicluster +* hadoop-client-runtime +* hadoop-hdfs-client +* hadoop-hdfs-native-client +* hadoop-mapreduce-client-app +* hadoop-mapreduce-client-common +* hadoop-mapreduce-client-core +* hadoop-mapreduce-client-hs +* hadoop-mapreduce-client-hs-plugins +* hadoop-mapreduce-client-jobclient +* hadoop-mapreduce-client-nativetask +* hadoop-mapreduce-client-shuffle +* hadoop-yarn-client + +All other build artifacts SHALL be considered +[Private](./InterfaceClassification.html#Private) and +[Unstable](./InterfaceClassification.html#Unstable). ### Hardware/Software Requirements -To keep up with the latest advances in hardware, operating systems, JVMs, and other software, new Hadoop releases or some of their features might require higher versions of the same. For a specific environment, upgrading Hadoop might require upgrading other dependent software components. +To keep up with the latest advances in hardware, operating systems, JVMs, and +other software, new Hadoop releases may include features that require +newer hardware, operating systems releases, or JVM versions than previous +Hadoop releases. For a specific environment, upgrading Hadoop might require +upgrading other dependent software components. #### Policies * Hardware * Architecture: The community has no plans to restrict Hadoop to specific architectures, but can have family-specific optimizations. - * Minimum resources: While there are no guarantees on the minimum resources required by Hadoop daemons, the community attempts to not increase requirements within a minor release. -* Operating Systems: The community will attempt to maintain the same OS requirements (OS kernel versions) within a minor release. Currently GNU/Linux and Microsoft Windows are the OSes officially supported by the community while Apache Hadoop is known to work reasonably well on other OSes such as Apple MacOSX, Solaris etc. -* The JVM requirements will not change across point releases within the same minor release except if the JVM version under question becomes unsupported. Minor/major releases might require later versions of JVM for some/all of the supported operating systems. -* Other software: The community tries to maintain the minimum versions of additional software required by Hadoop. For example, ssh, kerberos etc. + * Minimum resources: While there are no guarantees on the minimum resources required by Hadoop daemons, the developer community SHOULD avoid increasing requirements within a minor release. +* Operating Systems: The community SHOULD maintain the same minimum OS requirements (OS kernel versions) within a minor release. Currently GNU/Linux and Microsoft Windows are the OSes officially supported by the community, while Apache Hadoop is known to work reasonably well on other OSes such as Apple MacOSX, Solaris, etc. +* The JVM requirements SHALL NOT change across minor releases within the same major release unless the JVM version in question becomes unsupported. The JVM version requirement MAY be different for different operating systems or even operating system releases. +* File systems supported by Hadoop, e.g. through the HDFS FileSystem API, SHOULD not become unsupported between minor releases within a major version unless a migration path to an alternate client implementation is available. References ---------- diff --git a/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md b/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md index c7309ab771..451f9be307 100644 --- a/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md +++ b/hadoop-common-project/hadoop-common/src/site/markdown/InterfaceClassification.md @@ -66,54 +66,103 @@ Hadoop uses the following kinds of audience in order of increasing/wider visibil #### Private -The interface is for internal use within the project (such as HDFS or MapReduce) -and should not be used by applications or by other projects. It is subject to -change at anytime without notice. Most interfaces of a project are Private (also -referred to as project-private). +A Private interface is for internal use within the project (such as HDFS or +MapReduce) and should not be used by applications or by other projects. Most +interfaces of a project are Private (also referred to as project-private). +Unless an interface is intentionally exposed for external consumption, it should +be marked Private. #### Limited-Private -The interface is used by a specified set of projects or systems (typically -closely related projects). Other projects or systems should not use the -interface. Changes to the interface will be communicated/negotiated with the +A Limited-Private interface is used by a specified set of projects or systems +(typically closely related projects). Other projects or systems should not use +the interface. Changes to the interface will be communicated/negotiated with the specified projects. For example, in the Hadoop project, some interfaces are LimitedPrivate{HDFS, MapReduce} in that they are private to the HDFS and MapReduce projects. #### Public -The interface is for general use by any application. +A Public interface is for general use by any application. + +### Change Compatibility + +Changes to an API fall into two broad categories: compatible and incompatible. +A compatible change is a change that meets the following criteria: + +* no existing capabilities are removed, +* no existing capabilities are modified in a way that prevents their use by clients that were constructed to use the interface prior to the change, and +* no capabilities are added that require changes to clients that were constructed to use the interface prior to the change. + +Any change that does not meet these three criteria is an incompatible change. +Stated simply a compatible change will not break existing clients. These +examples are compatible changes: + +* adding a method to a Java class, +* adding an optional parameter to a RESTful web service, or +* adding a tag to an XML document. +* making the audience annotation of an interface more broad (e.g. from Private to Public) or the change compatibility annotation more restrictive (e.g. from Evolving to Stable) + +These examples are incompatible changes: + +* removing a method from a Java class, +* adding a method to a Java interface, +* adding a required parameter to a RESTful web service, or +* renaming a field in a JSON document. +* making the audience annotation of an interface less broad (e.g. from Public to Limited Private) or the change compatibility annotation more restrictive (e.g. from Evolving to Unstable) ### Stability -Stability denotes how stable an interface is, as in when incompatible changes to -the interface are allowed. Hadoop APIs have the following levels of stability. +Stability denotes how stable an interface is and when compatible and +incompatible changes to the interface are allowed. Hadoop APIs have the +following levels of stability. #### Stable -Can evolve while retaining compatibility for minor release boundaries; in other -words, incompatible changes to APIs marked as Stable are allowed only at major -releases (i.e. at m.0). +A Stable interface is exposed as a preferred means of communication. A Stable +interface is expected not to change incompatibly within a major release and +hence serves as a safe development target. A Stable interface may evolve +compatibly between minor releases. + +Incompatible changes allowed: major (X.0.0) +Compatible changes allowed: maintenance (x.Y.0) #### Evolving -Evolving, but incompatible changes are allowed at minor releases (i.e. m .x) +An Evolving interface is typically exposed so that users or external code can +make use of a feature before it has stabilized. The expectation that an +interface should "eventually" stabilize and be promoted to Stable, however, +is not a requirement for the interface to be labeled as Evolving. + +Incompatible changes are allowed for Evolving interface only at minor releases. + +Incompatible changes allowed: minor (x.Y.0) +Compatible changes allowed: maintenance (x.y.Z) #### Unstable -Incompatible changes to Unstable APIs are allowed at any time. This usually makes -sense for only private interfaces. +An Unstable interface is one for which no compatibility guarantees are made. An +Unstable interface is not necessarily unstable. An unstable interface is +typically exposed because a user or external code needs to access an interface +that is not intended for consumption. The interface is exposed as an Unstable +interface to state clearly that even though the interface is exposed, it is not +the preferred access path, and no compatibility guarantees are made for it. -However one may call this out for a supposedly public interface to highlight -that it should not be used as an interface; for public interfaces, labeling it -as Not-an-interface is probably more appropriate than "Unstable". +Unless there is a reason to offer a compatibility guarantee on an interface, +whether it is exposed or not, it should be labeled as Unstable. Private +interfaces also should be Unstable in most cases. -Examples of publicly visible interfaces that are unstable -(i.e. not-an-interface): GUI, CLIs whose output format will change. +Incompatible changes to Unstable interfaces are allowed at any time. + +Incompatible changes allowed: maintenance (x.y.Z) +Compatible changes allowed: maintenance (x.y.Z) #### Deprecated -APIs that could potentially be removed in the future and should not be used. +A Deprecated interface could potentially be removed in the future and should +not be used. Even so, a Deprecated interface will continue to function until +it is removed. When a Deprecated interface can be removed depends on whether +it is also Stable, Evolving, or Unstable. How are the Classifications Recorded? ------------------------------------- @@ -121,95 +170,101 @@ How are the Classifications Recorded? How will the classification be recorded for Hadoop APIs? * Each interface or class will have the audience and stability recorded using - annotations in org.apache.hadoop.classification package. + annotations in the org.apache.hadoop.classification package. -* The javadoc generated by the maven target javadoc:javadoc lists only the public API. +* The javadoc generated by the maven target javadoc:javadoc lists only the + public API. * One can derive the audience of java classes and java interfaces by the audience of the package in which they are contained. Hence it is useful to declare the audience of each java package as public or private (along with the private audience variations). +How will the classification be recorded for other interfaces, such as CLIs? + +* See the [Hadoop Compatibility](Compatibility.html) page for details. + FAQ --- * Why aren’t the java scopes (private, package private and public) good enough? * Java’s scoping is not very complete. One is often forced to make a class - public in order for other internal components to use it. It does not have - friends or sub-package-private like C++. + public in order for other internal components to use it. It also does not + have friends or sub-package-private like C++. -* But I can easily access a private implementation interface if it is Java public. - Where is the protection and control? - * The purpose of this is not providing absolute access control. Its purpose - is to communicate to users and developers. One can access private - implementation functions in libc; however if they change the internal - implementation details, your application will break and you will have - little sympathy from the folks who are supplying libc. If you use a - non-public interface you understand the risks. +* But I can easily access a Private interface if it is Java public. Where is the + protection and control? + * The purpose of this classification scheme is not providing absolute + access control. Its purpose is to communicate to users and developers. + One can access private implementation functions in libc; however if + they change the internal implementation details, the application will + break and one will receive little sympathy from the folks who are + supplying libc. When using a non-public interface, the risks are + understood. -* Why bother declaring the stability of a private interface? - Aren’t private interfaces always unstable? - * Private interfaces are not always unstable. In the cases where they are - stable they capture internal properties of the system and can communicate +* Why bother declaring the stability of a Private interface? Aren’t Private + interfaces always Unstable? + * Private interfaces are not always Unstable. In the cases where they are + Stable they capture internal properties of the system and can communicate these properties to its internal users and to developers of the interface. - * e.g. In HDFS, NN-DN protocol is private but stable and can help - implement rolling upgrades. It communicates that this interface should - not be changed in incompatible ways even though it is private. - * e.g. In HDFS, FSImage stability provides more flexible rollback. + * e.g. In HDFS, NN-DN protocol is Private but Stable and can help + implement rolling upgrades. The stability annotation communicates that + this interface should not be changed in incompatible ways even though + it is Private. + * e.g. In HDFS, FSImage the Stabile designation provides more flexible + rollback. -* What is the harm in applications using a private interface that is stable? How - is it different than a public stable interface? - * While a private interface marked as stable is targeted to change only at +* What is the harm in applications using a Private interface that is Stable? + How is it different from a Public Stable interface? + * While a Private interface marked as Stable is targeted to change only at major releases, it may break at other times if the providers of that - interface are willing to change the internal users of that - interface. Further, a public stable interface is less likely to break even + interface also are willing to change the internal consumers of that + interface. Further, a Public Stable interface is less likely to break even at major releases (even though it is allowed to break compatibility) - because the impact of the change is larger. If you use a private interface + because the impact of the change is larger. If you use a Private interface (regardless of its stability) you run the risk of incompatibility. -* Why bother with Limited-private? Isn’t it giving special treatment to some projects? - That is not fair. - * First, most interfaces should be public or private; actually let us state - it even stronger: make it private unless you really want to expose it to - public for general use. - * Limited-private is for interfaces that are not intended for general +* Why bother with Limited-Private? Isn’t it giving special treatment to some + projects? That is not fair. + * Most interfaces should be Public or Private. An interface should be + Private unless it is explicitly intended for general use. + * Limited-Private is for interfaces that are not intended for general use. They are exposed to related projects that need special hooks. Such a - classification has a cost to both the supplier and consumer of the limited + classification has a cost to both the supplier and consumer of the interface. Both will have to work together if ever there is a need to break the interface in the future; for example the supplier and the consumers will have to work together to get coordinated releases of their - respective projects. This should not be taken lightly – if you can get - away with private then do so; if the interface is really for general use - for all applications then do so. But remember that making an interface - public has huge responsibility. Sometimes Limited-private is just right. - * A good example of a limited-private interface is BlockLocations, This is a - fairly low-level interface that we are willing to expose to MR and perhaps - HBase. We are likely to change it down the road and at that time we will - coordinate release effort with the MR team. - While MR and HDFS are always released in sync today, they may - change down the road. - * If you have a limited-private interface with many projects listed then you - are fooling yourself. It is practically public. - * It might be worth declaring a special audience classification called - Hadoop-Private for the Hadoop family. + respective projects. This contract should not be taken lightly–use + Private if possible; if the interface is really for general use + for all applications then use Public. Always remember that making an + interface Public comes with large burden of responsibility. Sometimes + Limited-Private is just right. + * A good example of a Limited-Private interface is BlockLocations. This + interface is a fairly low-level interface that is exposed to MapReduce + and HBase. The interface is likely to change down the road, and at that + time the release effort will have to be coordinated with the + MapReduce development team. While MapReduce and HDFS are always released + in sync today, that policy may change down the road. + * If you have a Limited-Private interface with many projects listed then + the interface is probably a good candidate to be made Public. -* Lets treat all private interfaces as Hadoop-private. What is the harm in - projects in the Hadoop family have access to private classes? - * Do we want MR accessing class files that are implementation details inside - HDFS. There used to be many such layer violations in the code that we have - been cleaning up over the last few years. We don’t want such layer - violations to creep back in by no separating between the major components - like HDFS and MR. +* Let's treat all Private interfaces as Limited-Private for all of Hadoop. What + is the harm if projects in the Hadoop family have access to private classes? + * There used to be many cases in the code where one project depended on the + internal implementation details of another. A significant effort went + into cleaning up those issues. Opening up all interfaces as + Limited-Private for all of Hadoop would open the door to reintroducing + such coupling issues. -* Aren't all public interfaces stable? - * One may mark a public interface as evolving in its early days. Here one is +* Aren't all Public interfaces Stable? + * One may mark a Public interface as Evolving in its early days. Here one is promising to make an effort to make compatible changes but may need to break it at minor releases. - * One example of a public interface that is unstable is where one is + * One example of a Public interface that is Unstable is where one is providing an implementation of a standards-body based interface that is still under development. For example, many companies, in an attempt to be first to market, have provided implementations of a new NFS protocol even when the protocol was not fully completed by IETF. The implementor cannot - evolve the interface in a fashion that causes least distruption because + evolve the interface in a fashion that causes least disruption because the stability is controlled by the standards body. Hence it is appropriate - to label the interface as unstable. + to label the interface as Unstable.