System performance metrics¶

`cpu`¶

Metrics to monitor system CPUs. See the InfluxDB Telegraf plugin documentation for more.

Tags: ActivityState, cpu, node_id

time_active: Total time the CPU has been active, performing tasks excluding idle time.
time_guest: Time spent running a virtual CPU for guest operating systems.
time_guest_nice: Time the CPU spent running a niced guest (a guest with a positive niceness value).
time_idle: Total time the CPU was not in use (idle).
time_iowait: Time the CPU was idle while waiting for I/O operations to complete.
time_irq: Time spent handling hardware interrupts.
time_nice: Time the CPU spent processing user processes with a positive niceness value.
time_softirq: Time spent handling software interrupts.
time_steal: Time that a virtual CPU waited for a real CPU while the hypervisor was servicing another virtual processor.
time_system: Time the CPU spent running system (kernel) processes.
time_user: Time spent on executing user processes.
usage_active: Percentage of time the CPU was active, performing tasks.
usage_guest: Percentage of CPU time spent running virtual CPUs for guest OSes.
usage_guest_nice: Percentage of CPU time spent running niced guests.
usage_idle: Percentage of time the CPU was idle.
usage_iowait: Percentage of time the CPU was idle due to waiting for I/O operations.
usage_irq: Percentage of time spent handling hardware interrupts.
usage_nice: Percentage of CPU time spent on processes with a positive niceness.
usage_softirq: Percentage of time spent handling software interrupts.
usage_steal: Percentage of time a virtual CPU waited for a real CPU while the hypervisor serviced another processor.
usage_system: Percentage of CPU time spent on system (kernel) processes.
usage_user: Percentage of CPU time spent executing user processes.

`mdstat`¶

Statistics about Linux MD RAID arrays configured on the host. RAID (redundant array of inexpensive or independent disks) combines multiple physical disks into one unit for the purpose of data redundancy (and therefore safety or protection against loss in the case of disk failure) as well as system performance (faster data access). Visit the InfluxDB Telegraf plugin documentation for more.

Tags: ActivityState (active or inactive), Devices, Name, _field, node_id

BlocksSynced: The count of blocks that have been scanned if the array is rebuilding/checking
BlocksSyncedFinishTime: Minutes remaining in the expected finish time of the rebuild scan
BlocksSyncedPct: Percentage remaining of the rebuild scan
BlocksSyncedSpeed: The current speed the rebuild is running at, listed in K/sec
BlocksTotal: The count of total blocks in the array
DisksActive: Number of disks in the array that are currently considered healthy
DisksDown: Number of disks in the array that are currently down, or non-operational
DisksFailed: Count of currently failed disks in the array
DisksSpare: Count of spare disks in the array
DisksTotal: Count of total disks in the array

`processes`¶

All processes, grouped by status. Find the InfluxDB Telegraf plugin documentation here.

Tags: node_id

blocked: Number of processes in a blocked state, waiting for resource or event to become available.
dead: Number of processes that have finished execution but still have an entry in the process table.
idle: Number of processes in an idle state, typically indicating they are not actively doing any work.
paging: Number of processes that are waiting for paging, either swapping into our out from disk.
running: Number of processes that are currently executing or ready to execute.
sleeping: Number of processes that are in a sleep state, inactive until certain conditions are met or events occur.
stopped: Number of processes that are stopped, typically due to receiving a signal or being in debug.
total: Total number of processes currently existing in the system.
total_threads: The total number of threads across all processes, as processes can have multiple threads.
unknown: Number of processes in an unknown state, where their state can't be determined.
zombies: Number of zombie processes, which have completed execution but still have an entry in the process table due to the parent process not reading its exit status.

`system`¶

These metrics provide general information about the system load, uptime, and number of users logged in. Visit the InfluxDB Telegraf plugin for details.

Tags: node_id

load1: The average system load over the last one minute, indicating the number of processes in the system's run queue.
load15: The average system load over the last 15 minutes, providing a longer-term view of the recent system load.
load5: The average system load over the last 5 minutes, offering a shorter-term perspective of the recent system load.
n_cpus: The number of CPU cores available in the system.
uptime: The total time in seconds that the system has been running since its last startup or reboot.

`temp`¶

Temperature readings as collected by system sensors. Visit the InfluxDB Telegraf plugin documentation for details.

Tags: node_id, sensor

temp: Temperature

System performance metrics¶

cpu¶

mdstat¶

processes¶

system¶

temp¶

`cpu`¶

`mdstat`¶

`processes`¶

`system`¶

`temp`¶