hadoop/dev-support/docker/README.md

114 lines
4.1 KiB
Markdown

<!--
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
# Docker images for building Hadoop
This folder contains the Dockerfiles for building Hadoop on various platforms.
# Dependency management
The mode of installation of the dependencies needed for building Hadoop varies from one platform to
the other. Different platforms have different toolchains. Some packages tend to be polymorphic
across platforms and most commonly, a package that's readily available in one platform's toolchain
isn't available on another. We thus, resort to building and installing the package from source,
causing duplication of code since this needs to be done for all the Dockerfiles pertaining to all
the platforms. We need a system to track a dependency - for a package - for a platform
- (and optionally) for a release. Thus, there's a lot of diversity that needs to be handled for
managing package dependencies and
`pkg-resolver` caters to that.
## Supported platforms
`pkg-resolver/platforms.json` contains a list of the supported platforms for dependency management.
## Package dependencies
`pkg-resolver/packages.json` maps a dependency to a given platform. Here's the schema of this JSON.
```json
{
"dependency_1": {
"platform_1": "package_1",
"platform_2": [
"package_1",
"package_2"
]
},
"dependency_2": {
"platform_1": [
"package_1",
"package_2",
"package_3"
]
},
"dependency_3": {
"platform_1": {
"release_1": "package_1_1_1",
"release_2": [
"package_1_2_1",
"package_1_2_2"
]
},
"platform_2": [
"package_2_1",
{
"release_1": "package_2_1_1"
}
]
}
}
```
The root JSON element contains unique _dependency_ children. This in turn contains the name of the _
platforms_ and the list of _packages_ to be installed for that platform. Just to give an example of
how to interpret the above JSON -
1. For `dependency_1`, `package_1` needs to be installed for `platform_1`.
2. For `dependency_2`, `package_1` and `package_2` needs to be installed for `platform_2`.
3. For `dependency_2`, `package_1`, `package_3` and `package_3` needs to be installed for
`platform_1`.
4. For `dependency_3`, `package_1_1_1` gets installed only if `release_1` has been specified
for `platform_1`.
5. For `dependency_3`, the packages `package_1_2_1` and `package_1_2_2` gets installed only
if `release_2` has been specified for `platform_1`.
6. For `dependency_3`, for `platform_2`, `package_2_1` is always installed, but `package_2_1_1` gets
installed only if `release_1` has been specified.
### Tool help
```shell
$ pkg-resolver/resolve.py -h
usage: resolve.py [-h] [-r RELEASE] platform
Platform package dependency resolver for building Apache Hadoop
positional arguments:
platform The name of the platform to resolve the dependencies for
optional arguments:
-h, --help show this help message and exit
-r RELEASE, --release RELEASE
The release label to filter the packages for the given platform
```
## Standalone packages
Most commonly, some packages are not available across the toolchains in various platforms. Thus, we
would need to build and install them. Since we need to do this across all the Dockerfiles for all
the platforms, it could lead to code duplication and managing them becomes a hassle. Thus, we put
the build steps in a `pkg-resolver/install-<package>.sh` and invoke this in all the Dockerfiles.