Telemetry Types
Metrics collected by the Rezolus agent, organized by sampler category.
Each sampler can be individually enabled or disabled in the agent config. Metrics are labeled with dimensions like state, op, direction, etc. Many metrics are also collected per-cgroup.
CPU
Usage
CPU time by state, softirq breakdown
expand_more
Usage
CPU time by state, softirq breakdown
cpu_usagePer-CPU nanoseconds by state: user, systemsoftirqPer-CPU interrupt count by kind: hi, timer, net_tx, net_rx, block, irq_poll, tasklet, sched, hrtimer, rcusoftirq_timePer-CPU nanoseconds spent in softirq, same kindscgroup_cpu_usagePer-cgroup nanoseconds by state: user, system
Frequency
APERF, MPERF, TSC cycle counters
expand_more
Frequency
APERF, MPERF, TSC cycle counters
cpu_aperfPer-CPU actual performance cyclescpu_mperfPer-CPU maximum performance cyclescpu_tscPer-CPU timestamp counter cycles
Performance Counters
Cycles, instructions, branch predictions, cache, TLB
expand_more
Performance Counters
Cycles, instructions, branch predictions, cache, TLB
cpu_cyclesPer-CPU cycle countcpu_instructionsPer-CPU retired instructionscpu_branch_instructionsPer-CPU branch instructionscpu_branch_missesPer-CPU branch mispredictionscpu_dtlb_missPer-CPU data TLB misses (with op: load, store on Intel)cpu_l3_accessPer-CPU L3 cache accessescpu_l3_missPer-CPU L3 cache missescpu_tlb_flushPer-CPU TLB flush count by reason: task_switch, remote_shootdown, local_shootdown, etc.cgroup_cpu_cyclesPer-cgroup cycle countcgroup_cpu_instructionsPer-cgroup retired instructions
Bandwidth & Migrations
CFS throttling, CPU migration events
expand_more
Bandwidth & Migrations
CFS throttling, CPU migration events
cpu_coresNumber of online logical cores (gauge)cpu_migrationsPer-CPU migration count with direction: from, tocgroup_cpu_bandwidth_*Per-cgroup CFS quota, period, throttled time, period countscgroup_cpu_migrationsPer-cgroup CPU migration countScheduler
Runqueue
Scheduling latency, running time, off-CPU time, context switches
expand_more
Runqueue
Scheduling latency, running time, off-CPU time, context switches
scheduler_runqueue_latencyHistogram of time tasks wait in the runqueue (ns)scheduler_runningHistogram of time tasks spend running on CPU (ns)scheduler_offcpuHistogram of time tasks spend off-CPU (ns)scheduler_context_switchPer-CPU involuntary context switchesscheduler_runqueue_waitPer-CPU total nanoseconds spent waitingcgroup_scheduler_*Per-cgroup: runqueue_wait, offcpu, context_switchBlock I/O
Latency
I/O latency distributions by operation type
expand_more
Latency
I/O latency distributions by operation type
blockio_latencyHistogram (ns) with op: read, write, flush, discard
Requests
Operation counts, bytes transferred, size distributions
expand_more
Requests
Operation counts, bytes transferred, size distributions
blockio_operationsCounter with op: read, write, flush, discardblockio_bytesCounter (bytes) with op: read, write, flush, discardblockio_sizeHistogram (bytes) with op: read, write, flush, discardNetwork
Traffic
Aggregate bytes and packets
expand_more
Traffic
Aggregate bytes and packets
network_bytesCounter with direction: receive, transmitnetwork_packetsCounter with direction: receive, transmit
Interfaces
Drops, transmit errors, timeouts
expand_more
Interfaces
Drops, transmit errors, timeouts
network_dropDropped packets counternetwork_transmit_busyTransmit busy counternetwork_transmit_completeCompleted transmissions counternetwork_transmit_timeoutTransmit timeout events
Ethtool (ENA)
AWS EC2 Elastic Network Adapter allowance counters
expand_more
Ethtool (ENA)
AWS EC2 Elastic Network Adapter allowance counters
network_ena_bandwidth_allowance_exceededWith direction: receive, transmitnetwork_ena_pps_allowance_exceededPackets-per-second limit exceedednetwork_ena_conntrack_allowance_exceededConnection tracking limit exceedednetwork_ena_linklocal_allowance_exceededLink-local traffic limit exceededTCP
Traffic
Bytes, packets, and segment size distributions
expand_more
Traffic
Bytes, packets, and segment size distributions
tcp_bytesCounter with direction: receive, transmittcp_packetsCounter with direction: receive, transmittcp_sizeHistogram (bytes) with direction: receive, transmit
Latency
Connection establishment, packet delivery, jitter, RTT
expand_more
Latency
Connection establishment, packet delivery, jitter, RTT
tcp_connect_latencyHistogram (ns) — time to establish connectiontcp_packet_latencyHistogram (ns) — receive-to-read latencytcp_jitterHistogram (ns) — inter-packet jittertcp_srttHistogram (ns) — smoothed round-trip timetcp_retransmitCounter — retransmitted packetsSyscall
Counts & Latency
Invocation counts and latency distributions by syscall category
expand_more
Counts & Latency
Invocation counts and latency distributions by syscall category
syscallCounter with op label (see categories below)syscall_latencyHistogram (ns) with op label (same categories)cgroup_syscallPer-cgroup counters with same op labelsSyscall categories (op values):
read
write
poll
lock
time
sleep
socket
yield
filesystem
memory
process
query
ipc
timer
event
other
Memory
Meminfo & VMStat
System memory gauges and NUMA allocation counters
expand_more
Meminfo & VMStat
System memory gauges and NUMA allocation counters
memory_totalGauge (bytes)memory_freeGauge (bytes)memory_availableGauge (bytes)memory_buffersGauge (bytes)memory_cachedGauge (bytes)memory_numa_hitCounter — allocations on intended nodememory_numa_missCounter — allocations on non-intended nodememory_numa_foreignCounter — allocations intended for this node, placed elsewherememory_numa_interleaveCounter — interleave policy allocationsmemory_numa_localCounter — allocations on local nodememory_numa_otherCounter — allocations on remote nodeGPU
NVIDIA
Memory, power, temperature, clocks, utilization (Linux)
expand_more
NVIDIA
Memory, power, temperature, clocks, utilization (Linux)
gpu_memoryPer-GPU gauge (bytes) with state: free, usedgpu_power_usagePer-GPU gauge (milliwatts)gpu_energy_consumptionPer-GPU counter (millijoules)gpu_temperaturePer-GPU gauge (Celsius)gpu_clockPer-GPU gauge (Hz) with clock: compute, graphics, memory, videogpu_utilizationPer-GPU gauge (percentage)gpu_memory_utilizationPer-GPU gauge (percentage)gpu_pcie_bandwidthPer-GPU gauge (bytes/sec) with direction: receivegpu_pcie_throughputPer-GPU gauge (bytes/sec) with direction: receive, transmitgpu_sm_utilizationPer-GPU gauge (%) — Hopper+ onlygpu_sm_occupancyPer-GPU gauge (%) — Hopper+ onlygpu_dram_bandwidth_utilizationPer-GPU gauge (%) — Hopper+ onlygpu_tensor_utilizationPer-GPU gauge (%) — Hopper+ only
Apple Silicon
Power, clocks, utilization (macOS)
expand_more
Apple Silicon
Power, clocks, utilization (macOS)
gpu_power_usagePer-GPU gauge (milliwatts)gpu_energy_consumptionPer-GPU counter (millijoules)gpu_clockPer-GPU gauge (Hz) with clock: graphicsgpu_utilizationPer-GPU gauge (percentage)Rezolus (self-monitoring)
Resource Usage
Rezolus process CPU, memory, I/O, and context switches
expand_more
Resource Usage
Rezolus process CPU, memory, I/O, and context switches
rezolus_cpu_usageCounter (ns) with state: user, systemrezolus_memory_usage_resident_set_sizeGauge (bytes)rezolus_memory_page_reclaimsCounterrezolus_memory_page_faultsCounterrezolus_blockio_operationsCounter with op: read, writerezolus_context_switchCounter with kind: voluntary, involuntary