Skip to content

System performance metrics

cpu

Metrics to monitor system CPUs. See the InfluxDB Telegraf plugin documentation for more.

Tags: ActivityState, cpu, node_id

  • time_active: Total time the CPU has been active, performing tasks excluding idle time.
  • time_guest: Time spent running a virtual CPU for guest operating systems.
  • time_guest_nice: Time the CPU spent running a niced guest (a guest with a positive niceness value).
  • time_idle: Total time the CPU was not in use (idle).
  • time_iowait: Time the CPU was idle while waiting for I/O operations to complete.
  • time_irq: Time spent handling hardware interrupts.
  • time_nice: Time the CPU spent processing user processes with a positive niceness value.
  • time_softirq: Time spent handling software interrupts.
  • time_steal: Time that a virtual CPU waited for a real CPU while the hypervisor was servicing another virtual processor.
  • time_system: Time the CPU spent running system (kernel) processes.
  • time_user: Time spent on executing user processes.
  • usage_active: Percentage of time the CPU was active, performing tasks.
  • usage_guest: Percentage of CPU time spent running virtual CPUs for guest OSes.
  • usage_guest_nice: Percentage of CPU time spent running niced guests.
  • usage_idle: Percentage of time the CPU was idle.
  • usage_iowait: Percentage of time the CPU was idle due to waiting for I/O operations.
  • usage_irq: Percentage of time spent handling hardware interrupts.
  • usage_nice: Percentage of CPU time spent on processes with a positive niceness.
  • usage_softirq: Percentage of time spent handling software interrupts.
  • usage_steal: Percentage of time a virtual CPU waited for a real CPU while the hypervisor serviced another processor.
  • usage_system: Percentage of CPU time spent on system (kernel) processes.
  • usage_user: Percentage of CPU time spent executing user processes.

mdstat

Statistics about Linux MD RAID arrays configured on the host. RAID (redundant array of inexpensive or independent disks) combines multiple physical disks into one unit for the purpose of data redundancy (and therefore safety or protection against loss in the case of disk failure) as well as system performance (faster data access). Visit the InfluxDB Telegraf plugin documentation for more.

Tags: ActivityState (active or inactive), Devices, Name, _field, node_id

  • BlocksSynced: The count of blocks that have been scanned if the array is rebuilding/checking
  • BlocksSyncedFinishTime: Minutes remaining in the expected finish time of the rebuild scan
  • BlocksSyncedPct: Percentage remaining of the rebuild scan
  • BlocksSyncedSpeed: The current speed the rebuild is running at, listed in K/sec
  • BlocksTotal: The count of total blocks in the array
  • DisksActive: Number of disks in the array that are currently considered healthy
  • DisksDown: Number of disks in the array that are currently down, or non-operational
  • DisksFailed: Count of currently failed disks in the array
  • DisksSpare: Count of spare disks in the array
  • DisksTotal: Count of total disks in the array

processes

All processes, grouped by status. Find the InfluxDB Telegraf plugin documentation here.

Tags: node_id

  • blocked: Number of processes in a blocked state, waiting for resource or event to become available.
  • dead: Number of processes that have finished execution but still have an entry in the process table.
  • idle: Number of processes in an idle state, typically indicating they are not actively doing any work.
  • paging: Number of processes that are waiting for paging, either swapping into our out from disk.
  • running: Number of processes that are currently executing or ready to execute.
  • sleeping: Number of processes that are in a sleep state, inactive until certain conditions are met or events occur.
  • stopped: Number of processes that are stopped, typically due to receiving a signal or being in debug.
  • total: Total number of processes currently existing in the system.
  • total_threads: The total number of threads across all processes, as processes can have multiple threads.
  • unknown: Number of processes in an unknown state, where their state can't be determined.
  • zombies: Number of zombie processes, which have completed execution but still have an entry in the process table due to the parent process not reading its exit status.

system

These metrics provide general information about the system load, uptime, and number of users logged in. Visit the InfluxDB Telegraf plugin for details.

Tags: node_id

  • load1: The average system load over the last one minute, indicating the number of processes in the system's run queue.
  • load15: The average system load over the last 15 minutes, providing a longer-term view of the recent system load.
  • load5: The average system load over the last 5 minutes, offering a shorter-term perspective of the recent system load.
  • n_cpus: The number of CPU cores available in the system.
  • uptime: The total time in seconds that the system has been running since its last startup or reboot.

temp

Temperature readings as collected by system sensors. Visit the InfluxDB Telegraf plugin documentation for details.

Tags: node_id, sensor

  • temp: Temperature