HADOOP-6350. Document Hadoop Metrics. (Contributed by Akira Ajisaka)
git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1602324 13f79535-47bb-0310-9956-ffa450edef68
This commit is contained in:
parent
dc7dd1fa19
commit
ab54276440
@ -420,6 +420,8 @@ Release 2.5.0 - UNRELEASED
|
||||
HADOOP-10376. Refactor refresh*Protocols into a single generic
|
||||
refreshConfigProtocol. (Chris Li via Arpit Agarwal)
|
||||
|
||||
HADOOP-6350. Documenting Hadoop metrics. (Akira Ajisaka via Arpit Agarwal)
|
||||
|
||||
OPTIMIZATIONS
|
||||
|
||||
BUG FIXES
|
||||
|
732
hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm
Normal file
732
hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm
Normal file
@ -0,0 +1,732 @@
|
||||
~~ Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
~~ contributor license agreements. See the NOTICE file distributed with
|
||||
~~ this work for additional information regarding copyright ownership.
|
||||
~~ The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
~~ (the "License"); you may not use this file except in compliance with
|
||||
~~ the License. You may obtain a copy of the License at
|
||||
~~
|
||||
~~ http://www.apache.org/licenses/LICENSE-2.0
|
||||
~~
|
||||
~~ Unless required by applicable law or agreed to in writing, software
|
||||
~~ distributed under the License is distributed on an "AS IS" BASIS,
|
||||
~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
~~ See the License for the specific language governing permissions and
|
||||
~~ limitations under the License.
|
||||
|
||||
---
|
||||
Metrics Guide
|
||||
---
|
||||
---
|
||||
${maven.build.timestamp}
|
||||
|
||||
%{toc}
|
||||
|
||||
Overview
|
||||
|
||||
Metrics are statistical information exposed by Hadoop daemons,
|
||||
used for monitoring, performance tuning and debug.
|
||||
There are many metrics available by default
|
||||
and they are very useful for troubleshooting.
|
||||
This page shows the details of the available metrics.
|
||||
|
||||
Each section describes each context into which metrics are grouped.
|
||||
|
||||
The documentation of Metrics 2.0 framework is
|
||||
{{{../../api/org/apache/hadoop/metrics2/package-summary.html}here}}.
|
||||
|
||||
jvm context
|
||||
|
||||
* JvmMetrics
|
||||
|
||||
Each metrics record contains tags such as ProcessName, SessionID
|
||||
and Hostname as additional information along with metrics.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<MemNonHeapUsedM>>> | Current non-heap memory used in MB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<MemNonHeapCommittedM>>> | Current non-heap memory committed in MB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<MemNonHeapMaxM>>> | Max non-heap memory size in MB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<MemHeapUsedM>>> | Current heap memory used in MB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<MemHeapCommittedM>>> | Current heap memory committed in MB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<MemHeapMaxM>>> | Max heap memory size in MB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<MemMaxM>>> | Max memory size in MB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ThreadsNew>>> | Current number of NEW threads
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ThreadsRunnable>>> | Current number of RUNNABLE threads
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ThreadsBlocked>>> | Current number of BLOCKED threads
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ThreadsWaiting>>> | Current number of WAITING threads
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ThreadsTimedWaiting>>> | Current number of TIMED_WAITING threads
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ThreadsTerminated>>> | Current number of TERMINATED threads
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GcInfo>>> | Total GC count and GC time in msec, grouped by the kind of GC. \
|
||||
| ex.) GcCountPS Scavenge=6, GCTimeMillisPS Scavenge=40,
|
||||
| GCCountPS MarkSweep=0, GCTimeMillisPS MarkSweep=0
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GcCount>>> | Total GC count
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GcTimeMillis>>> | Total GC time in msec
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LogFatal>>> | Total number of FATAL logs
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LogError>>> | Total number of ERROR logs
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LogWarn>>> | Total number of WARN logs
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LogInfo>>> | Total number of INFO logs
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
rpc context
|
||||
|
||||
* rpc
|
||||
|
||||
Each metrics record contains tags such as Hostname
|
||||
and port (number to which server is bound)
|
||||
as additional information along with metrics.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ReceivedBytes>>> | Total number of received bytes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SentBytes>>> | Total number of sent bytes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<RpcQueueTimeNumOps>>> | Total number of RPC calls
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<RpcQueueTimeAvgTime>>> | Average queue time in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<RpcProcessingTimeNumOps>>> | Total number of RPC calls (same to
|
||||
| RpcQueueTimeNumOps)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<RpcProcessingAvgTime>>> | Average Processing time in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<RpcAuthenticationFailures>>> | Total number of authentication failures
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<RpcAuthenticationSuccesses>>> | Total number of authentication successes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<RpcAuthorizationFailures>>> | Total number of authorization failures
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<RpcAuthorizationSuccesses>>> | Total number of authorization successes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<NumOpenConnections>>> | Current number of open connections
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CallQueueLength>>> | Current length of the call queue
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcQueueTime>>><num><<<sNumOps>>> | Shows total number of RPC calls
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcQueueTime>>><num><<<s50thPercentileLatency>>> |
|
||||
| | Shows the 50th percentile of RPC queue time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcQueueTime>>><num><<<s75thPercentileLatency>>> |
|
||||
| | Shows the 75th percentile of RPC queue time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcQueueTime>>><num><<<s90thPercentileLatency>>> |
|
||||
| | Shows the 90th percentile of RPC queue time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcQueueTime>>><num><<<s95thPercentileLatency>>> |
|
||||
| | Shows the 95th percentile of RPC queue time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcQueueTime>>><num><<<s99thPercentileLatency>>> |
|
||||
| | Shows the 99th percentile of RPC queue time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcProcessingTime>>><num><<<sNumOps>>> | Shows total number of RPC calls
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcProcessingTime>>><num><<<s50thPercentileLatency>>> |
|
||||
| | Shows the 50th percentile of RPC processing time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcProcessingTime>>><num><<<s75thPercentileLatency>>> |
|
||||
| | Shows the 75th percentile of RPC processing time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcProcessingTime>>><num><<<s90thPercentileLatency>>> |
|
||||
| | Shows the 90th percentile of RPC processing time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcProcessingTime>>><num><<<s95thPercentileLatency>>> |
|
||||
| | Shows the 95th percentile of RPC processing time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<rpcProcessingTime>>><num><<<s99thPercentileLatency>>> |
|
||||
| | Shows the 99th percentile of RPC processing time in milliseconds
|
||||
| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
||||
| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
* RetryCache/NameNodeRetryCache
|
||||
|
||||
RetryCache metrics is useful to monitor NameNode fail-over.
|
||||
Each metrics record contains Hostname tag.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CacheHit>>> | Total number of RetryCache hit
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CacheCleared>>> | Total number of RetryCache cleared
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CacheUpdated>>> | Total number of RetryCache updated
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
rpcdetailed context
|
||||
|
||||
Metrics of rpcdetailed context are exposed in unified manner by RPC
|
||||
layer. Two metrics are exposed for each RPC based on its name.
|
||||
Metrics named "(RPC method name)NumOps" indicates total number of
|
||||
method calls, and metrics named "(RPC method name)AvgTime" shows
|
||||
average turn around time for method calls in milliseconds.
|
||||
|
||||
* rpcdetailed
|
||||
|
||||
Each metrics record contains tags such as Hostname
|
||||
and port (number to which server is bound)
|
||||
as additional information along with metrics.
|
||||
|
||||
The Metrics about RPCs which is not called are not included
|
||||
in metrics record.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<methodname><<<NumOps>>> | Total number of the times the method is called
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<methodname><<<AvgTime>>> | Average turn around time of the method in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
dfs context
|
||||
|
||||
* namenode
|
||||
|
||||
Each metrics record contains tags such as ProcessName, SessionId,
|
||||
and Hostname as additional information along with metrics.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CreateFileOps>>> | Total number of files created
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FilesCreated>>> | Total number of files and directories created by create
|
||||
| or mkdir operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FilesAppended>>> | Total number of files appended
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GetBlockLocations>>> | Total number of getBlockLocations operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FilesRenamed>>> | Total number of rename <<operations>> (NOT number of
|
||||
| files/dirs renamed)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GetListingOps>>> | Total number of directory listing operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<DeleteFileOps>>> | Total number of delete operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FilesDeleted>>> | Total number of files and directories deleted by delete
|
||||
| or rename operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FileInfoOps>>> | Total number of getFileInfo and getLinkFileInfo
|
||||
| operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<AddBlockOps>>> | Total number of addBlock operations succeeded
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GetAdditionalDatanodeOps>>> | Total number of getAdditionalDatanode
|
||||
| operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CreateSymlinkOps>>> | Total number of createSymlink operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GetLinkTargetOps>>> | Total number of getLinkTarget operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FilesInGetListingOps>>> | Total number of files and directories listed by
|
||||
| directory listing operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<AllowSnapshotOps>>> | Total number of allowSnapshot operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<DisallowSnapshotOps>>> | Total number of disallowSnapshot operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CreateSnapshotOps>>> | Total number of createSnapshot operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<DeleteSnapshotOps>>> | Total number of deleteSnapshot operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<RenameSnapshotOps>>> | Total number of renameSnapshot operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ListSnapshottableDirOps>>> | Total number of snapshottableDirectoryStatus
|
||||
| operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SnapshotDiffReportOps>>> | Total number of getSnapshotDiffReport
|
||||
| operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<TransactionsNumOps>>> | Total number of Journal transactions
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<TransactionsAvgTime>>> | Average time of Journal transactions in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SyncsNumOps>>> | Total number of Journal syncs
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SyncsAvgTime>>> | Average time of Journal syncs in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<TransactionsBatchedInSync>>> | Total number of Journal transactions batched
|
||||
| in sync
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlockReportNumOps>>> | Total number of processing block reports from
|
||||
| DataNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlockReportAvgTime>>> | Average time of processing block reports in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CacheReportNumOps>>> | Total number of processing cache reports from
|
||||
| DataNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CacheReportAvgTime>>> | Average time of processing cache reports in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SafeModeTime>>> | The interval between FSNameSystem starts and the last
|
||||
| time safemode leaves in milliseconds. \
|
||||
| (sometimes not equal to the time in SafeMode,
|
||||
| see {{{https://issues.apache.org/jira/browse/HDFS-5156}HDFS-5156}})
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GetEditNumOps>>> | Total number of edits downloads from SecondaryNameNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GetEditAvgTime>>> | Average edits download time in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GetImageNumOps>>> |Total number of fsimage downloads from SecondaryNameNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<GetImageAvgTime>>> | Average fsimage download time in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PutImageNumOps>>> | Total number of fsimage uploads to SecondaryNameNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PutImageAvgTime>>> | Average fsimage upload time in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
* FSNamesystem
|
||||
|
||||
Each metrics record contains tags such as HAState and Hostname
|
||||
as additional information along with metrics.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<MissingBlocks>>> | Current number of missing blocks
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ExpiredHeartbeats>>> | Total number of expired heartbeats
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<TransactionsSinceLastCheckpoint>>> | Total number of transactions since
|
||||
| last checkpoint
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<TransactionsSinceLastLogRoll>>> | Total number of transactions since last
|
||||
| edit log roll
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LastWrittenTransactionId>>> | Last transaction ID written to the edit log
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LastCheckpointTime>>> | Time in milliseconds since epoch of last checkpoint
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CapacityTotal>>> | Current raw capacity of DataNodes in bytes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CapacityTotalGB>>> | Current raw capacity of DataNodes in GB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CapacityUsed>>> | Current used capacity across all DataNodes in bytes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CapacityUsedGB>>> | Current used capacity across all DataNodes in GB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CapacityRemaining>>> | Current remaining capacity in bytes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CapacityRemainingGB>>> | Current remaining capacity in GB
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CapacityUsedNonDFS>>> | Current space used by DataNodes for non DFS
|
||||
| purposes in bytes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<TotalLoad>>> | Current number of connections
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SnapshottableDirectories>>> | Current number of snapshottable directories
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Snapshots>>> | Current number of snapshots
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlocksTotal>>> | Current number of allocated blocks in the system
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FilesTotal>>> | Current number of files and directories
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PendingReplicationBlocks>>> | Current number of blocks pending to be
|
||||
| replicated
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<UnderReplicatedBlocks>>> | Current number of blocks under replicated
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CorruptBlocks>>> | Current number of blocks with corrupt replicas.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ScheduledReplicationBlocks>>> | Current number of blocks scheduled for
|
||||
| replications
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PendingDeletionBlocks>>> | Current number of blocks pending deletion
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ExcessBlocks>>> | Current number of excess blocks
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PostponedMisreplicatedBlocks>>> | (HA-only) Current number of blocks
|
||||
| postponed to replicate
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PendingDataNodeMessageCourt>>> | (HA-only) Current number of pending
|
||||
| block-related messages for later
|
||||
| processing in the standby NameNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<MillisSinceLastLoadedEdits>>> | (HA-only) Time in milliseconds since the
|
||||
| last time standby NameNode load edit log.
|
||||
| In active NameNode, set to 0
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlockCapacity>>> | Current number of block capacity
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<StaleDataNodes>>> | Current number of DataNodes marked stale due to delayed
|
||||
| heartbeat
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<TotalFiles>>> |Current number of files and directories (same as FilesTotal)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
* JournalNode
|
||||
|
||||
The server-side metrics for a journal from the JournalNode's perspective.
|
||||
Each metrics record contains Hostname tag as additional information
|
||||
along with metrics.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs60sNumOps>>> | Number of sync operations (1 minute granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs60s50thPercentileLatencyMicros>>> | The 50th percentile of sync
|
||||
| | latency in microseconds (1 minute granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs60s75thPercentileLatencyMicros>>> | The 75th percentile of sync
|
||||
| | latency in microseconds (1 minute granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs60s90thPercentileLatencyMicros>>> | The 90th percentile of sync
|
||||
| | latency in microseconds (1 minute granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs60s95thPercentileLatencyMicros>>> | The 95th percentile of sync
|
||||
| | latency in microseconds (1 minute granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs60s99thPercentileLatencyMicros>>> | The 99th percentile of sync
|
||||
| | latency in microseconds (1 minute granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs300sNumOps>>> | Number of sync operations (5 minutes granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs300s50thPercentileLatencyMicros>>> | The 50th percentile of sync
|
||||
| | latency in microseconds (5 minutes granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs300s75thPercentileLatencyMicros>>> | The 75th percentile of sync
|
||||
| | latency in microseconds (5 minutes granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs300s90thPercentileLatencyMicros>>> | The 90th percentile of sync
|
||||
| | latency in microseconds (5 minutes granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs300s95thPercentileLatencyMicros>>> | The 95th percentile of sync
|
||||
| | latency in microseconds (5 minutes granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs300s99thPercentileLatencyMicros>>> | The 99th percentile of sync
|
||||
| | latency in microseconds (5 minutes granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs3600sNumOps>>> | Number of sync operations (1 hour granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs3600s50thPercentileLatencyMicros>>> | The 50th percentile of sync
|
||||
| | latency in microseconds (1 hour granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs3600s75thPercentileLatencyMicros>>> | The 75th percentile of sync
|
||||
| | latency in microseconds (1 hour granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs3600s90thPercentileLatencyMicros>>> | The 90th percentile of sync
|
||||
| | latency in microseconds (1 hour granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs3600s95thPercentileLatencyMicros>>> | The 95th percentile of sync
|
||||
| | latency in microseconds (1 hour granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Syncs3600s99thPercentileLatencyMicros>>> | The 99th percentile of sync
|
||||
| | latency in microseconds (1 hour granularity)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BatchesWritten>>> | Total number of batches written since startup
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<TxnsWritten>>> | Total number of transactions written since startup
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BytesWritten>>> | Total number of bytes written since startup
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BatchesWrittenWhileLagging>>> | Total number of batches written where this
|
||||
| | node was lagging
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LastWriterEpoch>>> | Current writer's epoch number
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CurrentLagTxns>>> | The number of transactions that this JournalNode is
|
||||
| | lagging
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LastWrittenTxId>>> | The highest transaction id stored on this JournalNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LastPromisedEpoch>>> | The last epoch number which this node has promised
|
||||
| | not to accept any lower epoch, or 0 if no promises have been made
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
* datanode
|
||||
|
||||
Each metrics record contains tags such as SessionId and Hostname
|
||||
as additional information along with metrics.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BytesWritten>>> | Total number of bytes written to DataNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BytesRead>>> | Total number of bytes read from DataNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlocksWritten>>> | Total number of blocks written to DataNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlocksRead>>> | Total number of blocks read from DataNode
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlocksReplicated>>> | Total number of blocks replicated
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlocksRemoved>>> | Total number of blocks removed
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlocksVerified>>> | Total number of blocks verified
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlockVerificationFailures>>> | Total number of verifications failures
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlocksCached>>> | Total number of blocks cached
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlocksUncached>>> | Total number of blocks uncached
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ReadsFromLocalClient>>> | Total number of read operations from local client
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ReadsFromRemoteClient>>> | Total number of read operations from remote
|
||||
| client
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<WritesFromLocalClient>>> | Total number of write operations from local
|
||||
| client
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<WritesFromRemoteClient>>> | Total number of write operations from remote
|
||||
| client
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlocksGetLocalPathInfo>>> | Total number of operations to get local path
|
||||
| names of blocks
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FsyncCount>>> | Total number of fsync
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<VolumeFailures>>> | Total number of volume failures occurred
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ReadBlockOpNumOps>>> | Total number of read operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ReadBlockOpAvgTime>>> | Average time of read operations in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<WriteBlockOpNumOps>>> | Total number of write operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<WriteBlockOpAvgTime>>> | Average time of write operations in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlockChecksumOpNumOps>>> | Total number of blockChecksum operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlockChecksumOpAvgTime>>> | Average time of blockChecksum operations in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CopyBlockOpNumOps>>> | Total number of block copy operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CopyBlockOpAvgTime>>> | Average time of block copy operations in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ReplaceBlockOpNumOps>>> | Total number of block replace operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ReplaceBlockOpAvgTime>>> | Average time of block replace operations in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<HeartbeatsNumOps>>> | Total number of heartbeats
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<HeartbeatsAvgTime>>> | Average heartbeat time in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlockReportsNumOps>>> | Total number of block report operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<BlockReportsAvgTime>>> | Average time of block report operations in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CacheReportsNumOps>>> | Total number of cache report operations
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<CacheReportsAvgTime>>> | Average time of cache report operations in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PacketAckRoundTripTimeNanosNumOps>>> | Total number of ack round trip
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PacketAckRoundTripTimeNanosAvgTime>>> | Average time from ack send to
|
||||
| | receive minus the downstream ack time in nanoseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FlushNanosNumOps>>> | Total number of flushes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FlushNanosAvgTime>>> | Average flush time in nanoseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FsyncNanosNumOps>>> | Total number of fsync
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<FsyncNanosAvgTime>>> | Average fsync time in nanoseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SendDataPacketBlockedOnNetworkNanosNumOps>>> | Total number of sending
|
||||
| packets
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SendDataPacketBlockedOnNetworkNanosAvgTime>>> | Average waiting time of
|
||||
| | sending packets in nanoseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SendDataPacketTransferNanosNumOps>>> | Total number of sending packets
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SendDataPacketTransferNanosAvgTime>>> | Average transfer time of sending
|
||||
| packets in nanoseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
ugi context
|
||||
|
||||
* UgiMetrics
|
||||
|
||||
UgiMetrics is related to user and group information.
|
||||
Each metrics record contains Hostname tag as additional information
|
||||
along with metrics.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LoginSuccessNumOps>>> | Total number of successful kerberos logins
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LoginSuccessAvgTime>>> | Average time for successful kerberos logins in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LoginFailureNumOps>>> | Total number of failed kerberos logins
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<LoginFailureAvgTime>>> | Average time for failed kerberos logins in
|
||||
| milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<getGroupsNumOps>>> | Total number of group resolutions
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<getGroupsAvgTime>>> | Average time for group resolution in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<getGroups>>><num><<<sNumOps>>> |
|
||||
| | Total number of group resolutions (<num> seconds granularity). <num> is
|
||||
| | specified by <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<getGroups>>><num><<<s50thPercentileLatency>>> |
|
||||
| | Shows the 50th percentile of group resolution time in milliseconds
|
||||
| | (<num> seconds granularity). <num> is specified by
|
||||
| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<getGroups>>><num><<<s75thPercentileLatency>>> |
|
||||
| | Shows the 75th percentile of group resolution time in milliseconds
|
||||
| | (<num> seconds granularity). <num> is specified by
|
||||
| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<getGroups>>><num><<<s90thPercentileLatency>>> |
|
||||
| | Shows the 90th percentile of group resolution time in milliseconds
|
||||
| | (<num> seconds granularity). <num> is specified by
|
||||
| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<getGroups>>><num><<<s95thPercentileLatency>>> |
|
||||
| | Shows the 95th percentile of group resolution time in milliseconds
|
||||
| | (<num> seconds granularity). <num> is specified by
|
||||
| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<getGroups>>><num><<<s99thPercentileLatency>>> |
|
||||
| | Shows the 99th percentile of group resolution time in milliseconds
|
||||
| | (<num> seconds granularity). <num> is specified by
|
||||
| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
metricssystem context
|
||||
|
||||
* MetricsSystem
|
||||
|
||||
MetricsSystem shows the statistics for metrics snapshots and publishes.
|
||||
Each metrics record contains Hostname tag as additional information
|
||||
along with metrics.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<NumActiveSources>>> | Current number of active metrics sources
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<NumAllSources>>> | Total number of metrics sources
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<NumActiveSinks>>> | Current number of active sinks
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<NumAllSinks>>> | Total number of sinks \
|
||||
| (BUT usually less than <<<NumActiveSinks>>>,
|
||||
| see {{{https://issues.apache.org/jira/browse/HADOOP-9946}HADOOP-9946}})
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SnapshotNumOps>>> | Total number of operations to snapshot statistics from
|
||||
| a metrics source
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<SnapshotAvgTime>>> | Average time in milliseconds to snapshot statistics
|
||||
| from a metrics source
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PublishNumOps>>> | Total number of operations to publish statistics to a
|
||||
| sink
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PublishAvgTime>>> | Average time in milliseconds to publish statistics to
|
||||
| a sink
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<DroppedPubAll>>> | Total number of dropped publishes
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Sink_>>><instance><<<NumOps>>> | Total number of sink operations for the
|
||||
| <instance>
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Sink_>>><instance><<<AvgTime>>> | Average time in milliseconds of sink
|
||||
| operations for the <instance>
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Sink_>>><instance><<<Dropped>>> | Total number of dropped sink operations
|
||||
| for the <instance>
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<Sink_>>><instance><<<Qsize>>> | Current queue length of sink operations \
|
||||
| (BUT always set to 0 because nothing to
|
||||
| increment this metrics, see
|
||||
| {{{https://issues.apache.org/jira/browse/HADOOP-9941}HADOOP-9941}})
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|
||||
default context
|
||||
|
||||
* StartupProgress
|
||||
|
||||
StartupProgress metrics shows the statistics of NameNode startup.
|
||||
Four metrics are exposed for each startup phase based on its name.
|
||||
The startup <phase>s are <<<LoadingFsImage>>>, <<<LoadingEdits>>>,
|
||||
<<<SavingCheckpoint>>>, and <<<SafeMode>>>.
|
||||
Each metrics record contains Hostname tag as additional information
|
||||
along with metrics.
|
||||
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|| Name || Description
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<ElapsedTime>>> | Total elapsed time in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<<<PercentComplete>>> | Current rate completed in NameNode startup progress \
|
||||
| (The max value is not 100 but 1.0)
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<phase><<<Count>>> | Total number of steps completed in the phase
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<phase><<<ElapsedTime>>> | Total elapsed time in the phase in milliseconds
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<phase><<<Total>>> | Total number of steps in the phase
|
||||
*-------------------------------------+--------------------------------------+
|
||||
|<phase><<<PercentComplete>>> | Current rate completed in the phase \
|
||||
| (The max value is not 100 but 1.0)
|
||||
*-------------------------------------+--------------------------------------+
|
@ -137,6 +137,7 @@
|
||||
<item name="Common CHANGES.txt" href="hadoop-project-dist/hadoop-common/CHANGES.txt"/>
|
||||
<item name="HDFS CHANGES.txt" href="hadoop-project-dist/hadoop-hdfs/CHANGES.txt"/>
|
||||
<item name="MapReduce CHANGES.txt" href="hadoop-project-dist/hadoop-mapreduce/CHANGES.txt"/>
|
||||
<item name="Metrics" href="hadoop-project-dist/hadoop-common/Metrics.html"/>
|
||||
</menu>
|
||||
|
||||
<menu name="Configuration" inherit="top">
|
||||
|
Loading…
Reference in New Issue
Block a user