Hadoop メトリクスの一覧
Hadoopメトリクスを網羅している一覧がなく、Hadoopクラスタの監視をつくるときに自分が困ったので、言い出しっぺの法則からとりあえず一覧だけつくりました。ただし、ugi,fairschedulerのメトリクスは省略。あと、rpc.detailed-metricsは全体を一覧にするのが難しい*1ので、一覧からは割愛で。
HBaseのメトリクスはこちらにまとまっているので、そちら参照。→ http://hbase.apache.org/book/hbase_metrics.html
またメトリクスのうち、rpcメトリクスの説明はこちらにあります。 → https://issues.apache.org/jira/browse/HADOOP-6599
jvmとrpcメトリクスは、どのデーモンでも共通。そのほかは各デーモン固有のメトリクスになっています。数が多くて、まだよくわかっていないところも多いので、中身は随時見ていきたいと思います。メトリクス名から自明の項目もありますが。監視の詳細はZabbix-JPの勉強会で話せたらよいなーと思いつつ、まだ準備ができていません。。。
Namenode
dfs.FSDirectory.files_deleted dfs.FSNamesystem.BlockCapacity dfs.FSNamesystem.BlocksTotal dfs.FSNamesystem.CapacityRemainingGB dfs.FSNamesystem.CapacityTotalGB dfs.FSNamesystem.CapacityUsedGB dfs.FSNamesystem.CorruptBlocks dfs.FSNamesystem.ExcessBlocks dfs.FSNamesystem.FilesTotal dfs.FSNamesystem.MissingBlocks dfs.FSNamesystem.PendingDeletionBlocks dfs.FSNamesystem.PendingReplicationBlocks dfs.FSNamesystem.ScheduledReplicationBlocks dfs.FSNamesystem.TotalLoad dfs.FSNamesystem.UnderReplicatedBlocks dfs.namenode.AddBlockOps dfs.namenode.CreateFileOps dfs.namenode.DeleteFileOps dfs.namenode.FileInfoOps dfs.namenode.FilesAppended dfs.namenode.FilesCreated dfs.namenode.FilesInGetListingOps dfs.namenode.FilesRenamed dfs.namenode.GetBlockLocations dfs.namenode.GetListingOps dfs.namenode.JournalTransactionsBatchedInSync dfs.namenode.Syncs_avg_time dfs.namenode.Syncs_num_ops dfs.namenode.Transactions_avg_time dfs.namenode.Transactions_num_ops dfs.namenode.blockReport_avg_time dfs.namenode.blockReport_num_ops dfs.namenode.fsImageLoadTime jvm.NameNode.metrics.gcCount jvm.NameNode.metrics.gcTimeMillis jvm.NameNode.metrics.logError jvm.NameNode.metrics.logFatal jvm.NameNode.metrics.logInfo jvm.NameNode.metrics.logWarn jvm.NameNode.metrics.maxMemoryM jvm.NameNode.metrics.memHeapCommittedM jvm.NameNode.metrics.memHeapUsedM jvm.NameNode.metrics.memNonHeapCommittedM jvm.NameNode.metrics.memNonHeapUsedM jvm.NameNode.metrics.threadsBlocked jvm.NameNode.metrics.threadsNew jvm.NameNode.metrics.threadsRunnable jvm.NameNode.metrics.threadsTerminated jvm.NameNode.metrics.threadsTimedWaiting jvm.NameNode.metrics.threadsWaiting rpc.metrics.NumOpenConnections rpc.metrics.ReceivedBytes rpc.metrics.RpcProcessingTime_avg_time rpc.metrics.RpcProcessingTime_num_ops rpc.metrics.RpcQueueTime_avg_time rpc.metrics.RpcQueueTime_num_ops rpc.metrics.SentBytes rpc.metrics.callQueueLen rpc.metrics.rpcAuthenticationFailures rpc.metrics.rpcAuthenticationSuccesses rpc.metrics.rpcAuthorizationFailures rpc.metrics.rpcAuthorizationSuccesses
JobTracker
mapred.jobtracker.blacklisted_maps mapred.jobtracker.blacklisted_reduces mapred.jobtracker.heartbeats mapred.jobtracker.jobs_completed mapred.jobtracker.jobs_failed mapred.jobtracker.jobs_killed mapred.jobtracker.jobs_preparing mapred.jobtracker.jobs_running mapred.jobtracker.jobs_submitted mapred.jobtracker.map_slots mapred.jobtracker.maps_completed mapred.jobtracker.maps_failed mapred.jobtracker.maps_killed mapred.jobtracker.maps_launched mapred.jobtracker.occupied_map_slots mapred.jobtracker.occupied_reduce_slots mapred.jobtracker.reduce_slots mapred.jobtracker.reduces_completed mapred.jobtracker.reduces_failed mapred.jobtracker.reduces_killed mapred.jobtracker.reduces_launched mapred.jobtracker.reserved_map_slots mapred.jobtracker.reserved_reduce_slots mapred.jobtracker.running_maps mapred.jobtracker.running_reduces mapred.jobtracker.trackers mapred.jobtracker.trackers_blacklisted mapred.jobtracker.trackers_decommissioned mapred.jobtracker.waiting_maps mapred.jobtracker.waiting_reduces jvm.JobTracker.metrics.gcCount jvm.JobTracker.metrics.gcTimeMillis jvm.JobTracker.metrics.logError jvm.JobTracker.metrics.logFatal jvm.JobTracker.metrics.logInfo jvm.JobTracker.metrics.logWarn jvm.JobTracker.metrics.maxMemoryM jvm.JobTracker.metrics.memHeapCommittedM jvm.JobTracker.metrics.memHeapUsedM jvm.JobTracker.metrics.memNonHeapCommittedM jvm.JobTracker.metrics.memNonHeapUsedM jvm.JobTracker.metrics.threadsBlocked jvm.JobTracker.metrics.threadsNew jvm.JobTracker.metrics.threadsRunnable jvm.JobTracker.metrics.threadsTerminated jvm.JobTracker.metrics.threadsTimedWaiting jvm.JobTracker.metrics.threadsWaiting rpc.metrics.NumOpenConnections rpc.metrics.ReceivedBytes rpc.metrics.RpcProcessingTime_avg_time rpc.metrics.RpcProcessingTime_num_ops rpc.metrics.RpcQueueTime_avg_time rpc.metrics.RpcQueueTime_num_ops rpc.metrics.SentBytes rpc.metrics.callQueueLen rpc.metrics.rpcAuthenticationFailures rpc.metrics.rpcAuthenticationSuccesses rpc.metrics.rpcAuthorizationFailures rpc.metrics.rpcAuthorizationSuccesses
Secondary Namenode
dfs.FSDirectory.files_deleted jvm.SecondaryNameNode.metrics.gcCount jvm.SecondaryNameNode.metrics.gcTimeMillis jvm.SecondaryNameNode.metrics.logError jvm.SecondaryNameNode.metrics.logFatal jvm.SecondaryNameNode.metrics.logInfo jvm.SecondaryNameNode.metrics.logWarn jvm.SecondaryNameNode.metrics.maxMemoryM jvm.SecondaryNameNode.metrics.memHeapCommittedM jvm.SecondaryNameNode.metrics.memHeapUsedM jvm.SecondaryNameNode.metrics.memNonHeapCommittedM jvm.SecondaryNameNode.metrics.memNonHeapUsedM jvm.SecondaryNameNode.metrics.threadsBlocked jvm.SecondaryNameNode.metrics.threadsNew jvm.SecondaryNameNode.metrics.threadsRunnable jvm.SecondaryNameNode.metrics.threadsTerminated jvm.SecondaryNameNode.metrics.threadsTimedWaiting jvm.SecondaryNameNode.metrics.threadsWaiting
Datanode
dfs.datanode.blockChecksumOp_avg_time dfs.datanode.blockChecksumOp_num_ops dfs.datanode.blockReports_avg_time dfs.datanode.blockReports_num_ops dfs.datanode.block_verification_failures dfs.datanode.blocks_read dfs.datanode.blocks_removed dfs.datanode.blocks_replicated dfs.datanode.blocks_verified dfs.datanode.blocks_written dfs.datanode.bytes_read dfs.datanode.bytes_written dfs.datanode.copyBlockOp_avg_time dfs.datanode.copyBlockOp_num_ops dfs.datanode.heartBeats_avg_time dfs.datanode.heartBeats_num_ops dfs.datanode.readBlockOp_avg_time dfs.datanode.readBlockOp_num_ops dfs.datanode.reads_from_local_client dfs.datanode.reads_from_remote_client dfs.datanode.replaceBlockOp_avg_time dfs.datanode.replaceBlockOp_num_ops dfs.datanode.volumeFailures dfs.datanode.writeBlockOp_avg_time dfs.datanode.writeBlockOp_num_ops dfs.datanode.writes_from_local_client dfs.datanode.writes_from_remote_client jvm.DataNode.metrics.gcCount jvm.DataNode.metrics.gcTimeMillis jvm.DataNode.metrics.logError jvm.DataNode.metrics.logFatal jvm.DataNode.metrics.logInfo jvm.DataNode.metrics.logWarn jvm.DataNode.metrics.maxMemoryM jvm.DataNode.metrics.memHeapCommittedM jvm.DataNode.metrics.memHeapUsedM jvm.DataNode.metrics.memNonHeapCommittedM jvm.DataNode.metrics.memNonHeapUsedM jvm.DataNode.metrics.threadsBlocked jvm.DataNode.metrics.threadsNew jvm.DataNode.metrics.threadsRunnable jvm.DataNode.metrics.threadsTerminated jvm.DataNode.metrics.threadsTimedWaiting jvm.DataNode.metrics.threadsWaiting rpc.metrics.NumOpenConnections rpc.metrics.ReceivedBytes rpc.metrics.RpcProcessingTime_avg_time rpc.metrics.RpcProcessingTime_num_ops rpc.metrics.RpcQueueTime_avg_time rpc.metrics.RpcQueueTime_num_ops rpc.metrics.SentBytes rpc.metrics.callQueueLen rpc.metrics.rpcAuthenticationFailures rpc.metrics.rpcAuthenticationSuccesses rpc.metrics.rpcAuthorizationFailures rpc.metrics.rpcAuthorizationSuccesses
TaskTracker
jvm.TaskTracker.metrics.gcCount jvm.TaskTracker.metrics.gcTimeMillis jvm.TaskTracker.metrics.logError jvm.TaskTracker.metrics.logFatal jvm.TaskTracker.metrics.logInfo jvm.TaskTracker.metrics.logWarn jvm.TaskTracker.metrics.maxMemoryM jvm.TaskTracker.metrics.memHeapCommittedM jvm.TaskTracker.metrics.memHeapUsedM jvm.TaskTracker.metrics.memNonHeapCommittedM jvm.TaskTracker.metrics.memNonHeapUsedM jvm.TaskTracker.metrics.threadsBlocked jvm.TaskTracker.metrics.threadsNew jvm.TaskTracker.metrics.threadsRunnable jvm.TaskTracker.metrics.threadsTerminated jvm.TaskTracker.metrics.threadsTimedWaiting jvm.TaskTracker.metrics.threadsWaiting mapred.shuffleOutput.shuffle_failed_outputs mapred.shuffleOutput.shuffle_handler_busy_percent mapred.shuffleOutput.shuffle_output_bytes mapred.shuffleOutput.shuffle_success_outputs mapred.tasktracker.mapTaskSlots mapred.tasktracker.maps_running mapred.tasktracker.reduceTaskSlots mapred.tasktracker.reduces_running mapred.tasktracker.tasks_completed mapred.tasktracker.tasks_failed_ping mapred.tasktracker.tasks_failed_timeout rpc.detailed-metrics.canCommit_avg_time rpc.detailed-metrics.canCommit_num_ops rpc.detailed-metrics.commitPending_avg_time rpc.detailed-metrics.commitPending_num_ops rpc.detailed-metrics.done_avg_time rpc.detailed-metrics.done_num_ops rpc.detailed-metrics.getMapCompletionEvents_avg_time rpc.detailed-metrics.getMapCompletionEvents_num_ops rpc.detailed-metrics.getProtocolVersion_avg_time rpc.detailed-metrics.getProtocolVersion_num_ops rpc.detailed-metrics.getTask_avg_time rpc.detailed-metrics.getTask_num_ops rpc.detailed-metrics.ping_avg_time rpc.detailed-metrics.ping_num_ops rpc.detailed-metrics.statusUpdate_avg_time rpc.detailed-metrics.statusUpdate_num_ops rpc.metrics.NumOpenConnections rpc.metrics.ReceivedBytes rpc.metrics.RpcProcessingTime_avg_time rpc.metrics.RpcProcessingTime_num_ops rpc.metrics.RpcQueueTime_avg_time rpc.metrics.RpcQueueTime_num_ops rpc.metrics.SentBytes rpc.metrics.callQueueLen rpc.metrics.rpcAuthenticationFailures rpc.metrics.rpcAuthenticationSuccesses rpc.metrics.rpcAuthorizationFailures rpc.metrics.rpcAuthorizationSuccesses
単純に並べると結構多いですね。。。
*1:rpc.detailed-metricsは該当メソッド実行時に項目が随時追加されるため、網羅するのが難しいのです。。。