Add collector for SR-IOV network Virtual Function statistics#3544
Add collector for SR-IOV network Virtual Function statistics#3544aharivel wants to merge 2 commits intoprometheus:masterfrom
Conversation
b5ea870 to
f5c2828
Compare
Bumps [github.com/jsimonetti/rtnetlink/v2](https://github.com/jsimonetti/rtnetlink) from 2.1.0 to 2.2.0. Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Add a new netvf collector that exposes SR-IOV network VF statistics and
configuration via rtnetlink. The collector queries netlink for
interfaces with Virtual Functions and exposes per-VF metrics:
- node_net_vf_info: VF configuration (MAC, VLAN, link state, spoof
check, trust, PCI address)
- node_net_vf_{receive,transmit}_{packets,bytes}_total: traffic counters
- node_net_vf_{broadcast,multicast}_packets_total: packet type counters
- node_net_vf_{receive,transmit}_dropped_total: drop counters
All metrics include a pci_address label resolved from the sysfs virtfn
symlink, enabling direct correlation with workloads that reference VFs
by PCI BDF address (e.g. OpenStack Nova, libvirt, DPDK).
The collector is disabled by default and can be enabled with
--collector.netvf. PF device filtering is supported via
--collector.netvf.device-include/exclude flags.
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
|
Are the mac and pci addresses somewhat stable? Otherwise I'd be worried about the cardinality |
|
@discordianfish PCI address cardinality is bounded and stable (hardware topology). MAC cardinality risk is real but contained to the info gauge where stale series age out naturally — it was already there before this change. |
|
|
||
| // parseVFInfo extracts VF information from link messages for testing. | ||
| // sysClassPath is the path to the sysfs class directory used to resolve VF PCI addresses. | ||
| func parseVFInfo(links []rtnetlink.LinkMessage, filter *deviceFilter, logger *slog.Logger, sysClassPath string) []vfMetrics { |
There was a problem hiding this comment.
Is this used anywhere expect the tests?
Also all these function ideally should go into procfs if needed
There was a problem hiding this comment.
Indeed, resolveVFPCIAddress is the sole function that reads sysfs — everything else goes through rtnetlink.
For parseVFInfo: I'll keep it as a test utility since it allows unit testing the VF parsing logic without a live netlink socket. WDYT ?
For resolveVFPCIAddress: agreed it would be a natural fit in prometheus/procfs. I can open a follow-up issue/PR there to contribute it upstream — would you prefer that happens before merging this, or as a separate follow-up?
There was a problem hiding this comment.
Yeah lets move before merging.
If parseVFInfo is only used in tests, you can also define it there
Bump github.com/jsimonetti/rtnetlink/v2 from 2.1.0 to 2.2.0 - this add the VF stats used for the next commit.
Add a new netvf collector that exposes SR-IOV network VF statistics and configuration via rtnetlink. The collector queries netlink for interfaces with Virtual Functions and exposes per-VF metrics:
check, trust, PCI address)
All metrics include a pci_address label resolved from the sysfs virtfn symlink, enabling direct correlation with workloads that reference VFs by PCI BDF address (e.g. OpenStack Nova, libvirt, DPDK).
Tested on MT2894 Family [ConnectX-6 Lx] and Ethernet Controller E810-XXV with VFs bound to both kernel driver and vfio-pci driver (for direct assignment to Virtual Machines).