HADOOP-18470. More in the 3.3.5 index.html about security (#5383)

Expands on the comments in cluster config to tell people
they shouldn't be running a cluster without a private VLAN
in cloud, that Knox is good here, and unsecured clusters
without a VLAN are just computation-as-a-service to crypto miners

Contributed by Steve Loughran
This commit is contained in:
Steve Loughran 2023-02-14 17:22:59 +00:00 committed by GitHub
parent e2ab35084a
commit d56977e909
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
2 changed files with 59 additions and 6 deletions

View File

@ -35,6 +35,8 @@ These instructions do not cover integration with any Kerberos services,
-everyone bringing up a production cluster should include connecting to their -everyone bringing up a production cluster should include connecting to their
organisation's Kerberos infrastructure as a key part of the deployment. organisation's Kerberos infrastructure as a key part of the deployment.
See [Security](./SecureMode.html) for details on how to secure a cluster.
Prerequisites Prerequisites
------------- -------------

View File

@ -24,7 +24,7 @@ Users are encouraged to read the full set of release notes.
This page provides an overview of the major changes. This page provides an overview of the major changes.
Azure ABFS: Critical Stream Prefetch Fix Azure ABFS: Critical Stream Prefetch Fix
--------------------------------------------- ----------------------------------------
The abfs has a critical bug fix The abfs has a critical bug fix
[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546). [HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
@ -120,17 +120,63 @@ be vulnerable, and the ugprades should also reduce the number of false
positives security scanners report. positives security scanners report.
We have not been able to upgrade every single dependency to the latest We have not been able to upgrade every single dependency to the latest
version there is. Some of those changes are just going to be incompatible. version there is. Some of those changes are fundamentally incompatible.
If you have concerns about the state of a specific library, consult the pache JIRA If you have concerns about the state of a specific library, consult the Apache JIRA
issue tracker to see whether a JIRA has been filed, discussions have taken place about issue tracker to see if an issue has been filed, discussions have taken place about
the library in question, and whether or not there is already a fix in the pipeline. the library in question, and whether or not there is already a fix in the pipeline.
*Please don't file new JIRAs about dependency-X.Y.Z having a CVE without *Please don't file new JIRAs about dependency-X.Y.Z having a CVE without
searching for any existing issue first* searching for any existing issue first*
As an open source project, contributions in this area are always welcome, As an open-source project, contributions in this area are always welcome,
especially in testing the active branches, testing applications downstream of especially in testing the active branches, testing applications downstream of
those branches and of whether updated dependencies trigger regressions. those branches and of whether updated dependencies trigger regressions.
Security Advisory
=================
Hadoop HDFS is a distributed filesystem allowing remote
callers to read and write data.
Hadoop YARN is a distributed job submission/execution
engine allowing remote callers to submit arbitrary
work into the cluster.
Unless a Hadoop cluster is deployed with
[caller authentication with Kerberos](./hadoop-project-dist/hadoop-common/SecureMode.html),
anyone with network access to the servers has unrestricted access to the data
and the ability to run whatever code they want in the system.
In production, there are generally three deployment patterns which
can, with care, keep data and computing resources private.
1. Physical cluster: *configure Hadoop security*, usually bonded to the
enterprise Kerberos/Active Directory systems.
Good.
1. Cloud: transient or persistent single or multiple user/tenant cluster
with private VLAN *and security*.
Good.
Consider [Apache Knox](https://knox.apache.org/) for managing remote
access to the cluster.
1. Cloud: transient single user/tenant cluster with private VLAN
*and no security at all*.
Requires careful network configuration as this is the sole
means of securing the cluster..
Consider [Apache Knox](https://knox.apache.org/) for managing
remote access to the cluster.
*If you deploy a Hadoop cluster in-cloud without security, and without configuring a VLAN
to restrict access to trusted users, you are implicitly sharing your data and
computing resources with anyone with network access*
If you do deploy an insecure cluster this way then port scanners will inevitably
find it and submit crypto-mining jobs. If this happens to you, please do not report
this as a CVE or security issue: it is _utterly predictable_. Secure *your cluster* if
you want to remain exclusively *your cluster*.
Finally, if you are using Hadoop as a service deployed/managed by someone else,
do determine what security their products offer and make sure it meets your requirements.
Getting Started Getting Started
=============== ===============
@ -142,3 +188,8 @@ Then move on to the
[Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html) [Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
to learn how to set up a multi-node Hadoop installation. to learn how to set up a multi-node Hadoop installation.
Before deploying Hadoop in production, read
[Hadoop in Secure Mode](./hadoop-project-dist/hadoop-common/SecureMode.html),
and follow its instructions to secure your cluster.