HADOOP-18094. Disable S3A auditing by default.

See HADOOP-18091. S3A auditing leaks memory through ThreadLocal references

* Adds a new option fs.s3a.audit.enabled to controls whether or not auditing
is enabled. This is false by default.

* When false, the S3A auditing manager is NoopAuditManagerS3A,
which was formerly only used for unit tests and
during filsystem initialization.

* When true, ActiveAuditManagerS3A is used for managing auditing,
allowing auditing events to be reported.

* updates documentation and tests.

This patch does not fix the underlying leak. When auditing is enabled,
long-lived threads will retain references to the audit managers
of S3A filesystem instances which have already been closed.

Contributed by Steve Loughran.

Change-Id: I671e594cd59e8ca77a1f65be791ad0ae9530b8d9
This commit is contained in:
Steve Loughran 2022-01-24 13:37:33 +00:00
parent 55192570a1
commit 4fd0389153
No known key found for this signature in database
GPG Key ID: D22CF846DBB162A0
9 changed files with 213 additions and 31 deletions

View File

@ -33,6 +33,8 @@
import org.apache.hadoop.fs.statistics.impl.IOStatisticsStore;
import static java.util.Objects.requireNonNull;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.AUDIT_ENABLED;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.AUDIT_ENABLED_DEFAULT;
import static org.apache.hadoop.fs.s3a.audit.impl.S3AInternalAuditConstants.AUDIT_SPAN_HANDLER_CONTEXT;
/**
@ -58,8 +60,14 @@ private AuditIntegration() {
public static AuditManagerS3A createAndStartAuditManager(
Configuration conf,
IOStatisticsStore iostatistics) {
ActiveAuditManagerS3A auditManager = new ActiveAuditManagerS3A(
requireNonNull(iostatistics));
AuditManagerS3A auditManager;
if (conf.getBoolean(AUDIT_ENABLED, AUDIT_ENABLED_DEFAULT)) {
auditManager = new ActiveAuditManagerS3A(
requireNonNull(iostatistics));
} else {
LOG.debug("auditing is disabled");
auditManager = stubAuditManager();
}
auditManager.init(conf);
auditManager.start();
LOG.debug("Started Audit Manager {}", auditManager);

View File

@ -34,6 +34,19 @@ private S3AAuditConstants() {
*/
public static final String UNAUDITED_OPERATION = "unaudited operation";
/**
* Is auditing enabled?
* Value: {@value}.
*/
public static final String AUDIT_ENABLED = "fs.s3a.audit.enabled";
/**
* Default auditing flag.
* Value: {@value}.
*/
public static final boolean AUDIT_ENABLED_DEFAULT = false;
/**
* Name of class used for audit logs: {@value}.
*/

View File

@ -44,8 +44,6 @@
/**
* Simple No-op audit manager for use before a real
* audit chain is set up, and for testing.
* Audit spans always have a unique ID and the activation/deactivation
* operations on them will update this audit manager's active span.
* It does have the service lifecycle, so do
* create a unique instance whenever used.
*/
@ -59,14 +57,7 @@ public class NoopAuditManagerS3A extends CompositeService
/**
* The inner auditor.
*/
private NoopAuditor auditor = NOOP_AUDITOR;
/**
* Thread local span. This defaults to being
* the unbonded span.
*/
private final ThreadLocal<AuditSpanS3A> activeSpan =
ThreadLocal.withInitial(this::getUnbondedSpan);
private final NoopAuditor auditor = NOOP_AUDITOR;
/**
* ID which is returned as a span ID in the audit event
@ -160,7 +151,7 @@ public boolean checkAccess(final Path path,
@Override
public void activate(final AuditSpanS3A span) {
activeSpan.set(span);
/* no-op */
}
@Override
@ -180,6 +171,6 @@ public static AuditSpanS3A createNewSpan(
final String name,
final String path1,
final String path2) {
return NOOP_AUDITOR.createSpan(name, path1, path2);
return NoopSpan.INSTANCE;
}
}

View File

@ -22,6 +22,18 @@ and inside the AWS S3 SDK, immediately before the request is executed.
The full architecture is covered in [Auditing Architecture](auditing_architecture.html);
this document covers its use.
## Important: Auditing is disabled by default
Due to a memory leak from the use of `ThreadLocal` fields, this auditing feature leaks memory as S3A filesystem
instances are created and deleted.
This causes problems in long-lived processes which either do not re-use filesystem
instances, or attempt to delete all instances belonging to specific users.
See [HADOOP-18091](https://issues.apache.org/jira/browse/HADOOP-18091) _S3A auditing leaks memory through ThreadLocal references_.
To avoid these memory leaks, auditing is disabled by default.
To turn auditing on, set `fs.s3a.audit.enabled` to `true`.
## Auditing workflow
1. An _Auditor Service_ can be instantiated for each S3A FileSystem instance,
@ -63,12 +75,16 @@ ideally even identifying the process/job generating load.
## Using Auditing
The Logging Auditor is enabled by default; it annotates the S3 logs.
Auditing is disabled by default.
When auditing enabled, a Logging Auditor will annotate the S3 logs through a custom
HTTP Referrer header in requests made to S3.
Other auditor classes may be used instead.
### Auditor Options
| Option | Meaning | Default Value |
|--------|---------|---------------|
| `fs.s3a.audit.enabled` | Is auditing enabled | `false` |
| `fs.s3a.audit.service.classname` | Auditor classname | `org.apache.hadoop.fs.s3a.audit.impl.LoggingAuditor` |
| `fs.s3a.audit.request.handlers` | List of extra subclasses of AWS SDK RequestHandler2 to include in handler chain | `""` |
| `fs.s3a.audit.referrer.enabled` | Logging auditor to publish the audit information in the HTTP Referrer header | `true` |
@ -76,14 +92,26 @@ The Logging Auditor is enabled by default; it annotates the S3 logs.
| `fs.s3a.audit.reject.out.of.span.operations` | Auditor to reject operations "outside of a span" | `false` |
### Disabling Auditing with the No-op Auditor
### Disabling Auditing.
The No-op auditor does not perform any logging of audit events.
In this release of Hadoop, auditing is disabled.
This can be explicitly set globally or for specific buckets
```xml
<property>
<name>fs.s3a.audit.service.classname</name>
<value>org.apache.hadoop.fs.s3a.audit.impl.NoopAuditor</value>
<name>fs.s3a.audit.enabled</name>
<value>false</value>
</property>
```
Specific buckets can have auditing disabled, even when it is enabled globally.
```xml
<property>
<name>fs.s3a.bucket.landsat-pds.audit.enabled</name>
<value>false</value>
<description>Do not audit landsat bucket operations</description>
</property>
```
@ -92,13 +120,18 @@ The No-op auditor does not perform any logging of audit events.
The "Logging Auditor" is the default auditor.
It provides two forms of logging
1. Logging of operations in the client via Log4J.
1. Logging of operations in the client via the active SLF4J imolementation.
1. Dynamic generation of the HTTP Referrer header for S3 requests.
The Logging Auditor is enabled by providing its classname in the option
`fs.s3a.audit.service.classname`.
```xml
<property>
<name>fs.s3a.audit.enabled</name>
<value>true</value>
</property>
<property>
<name>fs.s3a.audit.service.classname</name>
<value>org.apache.hadoop.fs.s3a.audit.impl.LoggingAuditor</value>

View File

@ -117,7 +117,18 @@ the auditor is bound to.
The auditor then creates and returns a span for the specific operation.
The AuditManagerS3A will automatically activate the span returned by the auditor
(i.e. assign it the thread local variable tracking the active span in each thread)
(i.e. assign it the thread local variable tracking the active span in each thread).
### Memory Leakage through `ThreadLocal` use
This architecture contains a critical defect,
[HADOOP-18091](https://issues.apache.org/jira/browse/HADOOP-18091) _S3A auditing leaks memory through ThreadLocal references_.
The code was written assuming that when the `ActiveAuditManagerS3A` service is
stopped, it's `ThreadLocal` fields would be freed.
In fact, they are retained until the threads with references are terminated.
This is why auditing is now disabled by default until a fix is implemented.
### Class `org.apache.hadoop.fs.audit.CommonAuditContext`
@ -141,8 +152,19 @@ thread.
### class `NoopAuditor`
This auditor creates spans which perform no auditing.
It is very efficient and reliable.
This auditor creates spans which doesn't do anything with the events.
```xml
<property>
<name>fs.s3a.audit.service.classname</name>
<value>org.apache.hadoop.fs.s3a.audit.impl.NoopAuditor</value>
</property>
```
This is *not* the same as disabling auditing, as it still uses the `ActiveAuditManagerS3A` class
which is the source of memory leaks.
Avoid using it except in tests as there is no benefit -simply significant cost.
### class `LoggingAuditor`

View File

@ -29,6 +29,7 @@
import static org.apache.hadoop.fs.s3a.Statistic.AUDIT_FAILURE;
import static org.apache.hadoop.fs.s3a.Statistic.AUDIT_REQUEST_EXECUTION;
import static org.apache.hadoop.fs.s3a.Statistic.AUDIT_SPAN_CREATION;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.AUDIT_ENABLED;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.AUDIT_REQUEST_HANDLERS;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.AUDIT_SERVICE_CLASSNAME;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.LOGGING_AUDIT_SERVICE;
@ -68,6 +69,7 @@ public static Configuration noopAuditConfig() {
final Configuration conf = new Configuration(false);
conf.set(
AUDIT_SERVICE_CLASSNAME, NOOP_AUDIT_SERVICE);
conf.setBoolean(AUDIT_ENABLED, true);
return conf;
}
@ -88,6 +90,7 @@ public static Configuration loggingAuditConfig() {
*/
public static Configuration enableLoggingAuditor(final Configuration conf) {
conf.set(AUDIT_SERVICE_CLASSNAME, LOGGING_AUDIT_SERVICE);
conf.setBoolean(AUDIT_ENABLED, true);
conf.setBoolean(REJECT_OUT_OF_SPAN_OPERATIONS, true);
return conf;
}
@ -117,7 +120,8 @@ public static Configuration resetAuditOptions(Configuration conf) {
REFERRER_HEADER_ENABLED,
REJECT_OUT_OF_SPAN_OPERATIONS,
AUDIT_REQUEST_HANDLERS,
AUDIT_SERVICE_CLASSNAME);
AUDIT_SERVICE_CLASSNAME,
AUDIT_ENABLED);
return conf;
}
}

View File

@ -37,8 +37,9 @@
import static org.apache.hadoop.fs.s3a.Statistic.AUDIT_REQUEST_EXECUTION;
import static org.apache.hadoop.fs.s3a.Statistic.INVOCATION_ACCESS;
import static org.apache.hadoop.fs.s3a.Statistic.STORE_IO_REQUEST;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.AUDIT_SERVICE_CLASSNAME;
import static org.apache.hadoop.fs.s3a.audit.AuditTestSupport.resetAuditOptions;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.AUDIT_ENABLED;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.AUDIT_SERVICE_CLASSNAME;
import static org.apache.hadoop.fs.s3a.performance.OperationCost.FILE_STATUS_ALL_PROBES;
import static org.apache.hadoop.fs.s3a.performance.OperationCost.FILE_STATUS_FILE_PROBE;
import static org.apache.hadoop.fs.s3a.performance.OperationCost.ROOT_FILE_STATUS_PROBE;
@ -67,6 +68,7 @@ public Configuration createConfiguration() {
Configuration conf = super.createConfiguration();
resetAuditOptions(conf);
conf.set(AUDIT_SERVICE_CLASSNAME, AccessCheckingAuditor.CLASS);
conf.setBoolean(AUDIT_ENABLED, true);
return conf;
}

View File

@ -0,0 +1,77 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.hadoop.fs.s3a.audit;
import org.assertj.core.api.Assertions;
import org.junit.Test;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
import org.apache.hadoop.fs.s3a.audit.impl.NoopAuditManagerS3A;
import org.apache.hadoop.fs.s3a.performance.AbstractS3ACostTest;
import static org.apache.hadoop.fs.s3a.audit.AuditTestSupport.NOOP_SPAN;
import static org.apache.hadoop.fs.s3a.audit.AuditTestSupport.resetAuditOptions;
/**
* Verify that by default audit managers are disabled.
*/
public class ITestAuditManagerDisabled extends AbstractS3ACostTest {
public ITestAuditManagerDisabled() {
super(true);
}
@Override
public Configuration createConfiguration() {
Configuration conf = super.createConfiguration();
resetAuditOptions(conf);
return conf;
}
/**
* The default auditor is the no-op auditor.
*/
@Test
public void testAuditorDisabled() {
final S3AFileSystem fs = getFileSystem();
final AuditManagerS3A auditManager = fs.getAuditManager();
Assertions.assertThat(auditManager)
.isInstanceOf(NoopAuditManagerS3A.class);
}
/**
* All the audit spans are the no-op span.
*/
@Test
public void testAuditSpansAreAllTheSame() throws Throwable {
final S3AFileSystem fs = getFileSystem();
final AuditSpanS3A span1 = fs.createSpan("span1", null, null);
final AuditSpanS3A span2 = fs.createSpan("span2", null, null);
Assertions.assertThat(span1)
.describedAs("audit span 1")
.isSameAs(NOOP_SPAN);
Assertions.assertThat(span2)
.describedAs("audit span 2")
.isSameAs(span1);
}
}

View File

@ -30,18 +30,22 @@
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.ContentSummary;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.S3AFileSystem;
import static org.apache.hadoop.fs.contract.ContractTestUtils.touch;
import static org.apache.hadoop.fs.s3a.S3ATestUtils.removeBaseAndBucketOverrides;
import static org.apache.hadoop.fs.s3a.Statistic.AUDIT_SPAN_CREATION;
import static org.apache.hadoop.fs.s3a.Statistic.INVOCATION_GET_CONTENT_SUMMARY;
import static org.apache.hadoop.fs.s3a.Statistic.OBJECT_LIST_REQUEST;
import static org.apache.hadoop.fs.s3a.Statistic.OBJECT_METADATA_REQUESTS;
import static org.apache.hadoop.fs.s3a.audit.S3AAuditConstants.AUDIT_ENABLED;
import static org.apache.hadoop.fs.s3a.performance.OperationCost.FILESTATUS_DIR_PROBE_L;
import static org.apache.hadoop.fs.s3a.performance.OperationCost.FILE_STATUS_FILE_PROBE;
import static org.apache.hadoop.fs.s3a.performance.OperationCost.LIST_OPERATION;
import static org.apache.hadoop.fs.s3a.performance.OperationCostValidator.probe;
/**
* Use metrics to assert about the cost of misc operations.
@ -53,20 +57,48 @@ public class ITestS3AMiscOperationCost extends AbstractS3ACostTest {
private static final Logger LOG =
LoggerFactory.getLogger(ITestS3AMiscOperationCost.class);
/**
* Parameter: should auditing be enabled?
*/
private final boolean auditing;
/**
* Parameterization.
*/
@Parameterized.Parameters(name = "{0}")
public static Collection<Object[]> params() {
return Arrays.asList(new Object[][]{
{"keep-markers", true},
{"delete-markers", false}
{"keep-markers-auditing", true, true},
{"delete-markers-unaudited", false, false}
});
}
public ITestS3AMiscOperationCost(final String name,
final boolean keepMarkers) {
final boolean keepMarkers,
final boolean auditing) {
super(keepMarkers);
this.auditing = auditing;
}
@Override
public Configuration createConfiguration() {
final Configuration conf = super.createConfiguration();
removeBaseAndBucketOverrides(conf, AUDIT_ENABLED);
conf.setBoolean(AUDIT_ENABLED, auditing);
return conf;
}
/**
* Expected audit count when auditing is enabled; expect 0
* when disabled.
* @param expected expected value.
* @return the probe.
*/
protected OperationCostValidator.ExpectedProbe withAuditCount(
final int expected) {
return probe(AUDIT_SPAN_CREATION,
auditing ? expected : 0);
}
/**
@ -81,7 +113,7 @@ public void testMkdirOverDir() throws Throwable {
// create the child; only assert on HEAD/GET IO
verifyMetrics(() -> fs.mkdirs(baseDir),
with(AUDIT_SPAN_CREATION, 1),
withAuditCount(1),
// full probe on dest plus list only on parent.
with(OBJECT_METADATA_REQUESTS, 0),
with(OBJECT_LIST_REQUEST, FILESTATUS_DIR_PROBE_L));
@ -110,7 +142,7 @@ public void testGetContentSummaryDir() throws Throwable {
final ContentSummary summary = verifyMetrics(
() -> getContentSummary(baseDir),
with(INVOCATION_GET_CONTENT_SUMMARY, 1),
with(AUDIT_SPAN_CREATION, 1),
withAuditCount(1),
always(FILE_STATUS_FILE_PROBE // look at path to see if it is a file
.plus(LIST_OPERATION) // it is not: so LIST
.plus(LIST_OPERATION))); // and a LIST on the child dir
@ -129,7 +161,7 @@ public void testGetContentMissingPath() throws Throwable {
verifyMetricsIntercepting(FileNotFoundException.class,
"", () -> getContentSummary(baseDir),
with(INVOCATION_GET_CONTENT_SUMMARY, 1),
with(AUDIT_SPAN_CREATION, 1),
withAuditCount(1),
always(FILE_STATUS_FILE_PROBE
.plus(FILE_STATUS_FILE_PROBE)
.plus(LIST_OPERATION)