Problem
Retina's eBPF plugins (dropreason, packetforward, dns) are tightly coupled to kernel internals — function signatures, tracepoint layouts, and BTF availability can change across kernel releases. Issue #1906 demonstrated this: inet_csk_accept changed its signature in Linux 6.10-rc1, silently breaking the dropreason plugin on newer kernels.
Today there is no automated CI coverage for kernels beyond what AKS ships (currently 6.6 LTS on AzureLinux 3, 6.8 on Ubuntu 24.04). This means breakage on newer kernels is only discovered manually and after the fact.
Proposal
Add a CI job (or scheduled workflow) that validates eBPF plugin loading and basic metric collection across a matrix of kernel versions. Specifically:
Kernel matrix
At minimum, cover the following kernel families:
| Kernel |
Source |
Rationale |
| 5.15 LTS |
Ubuntu 22.04 |
Oldest supported LTS baseline |
| 6.1 LTS |
AzureLinux 2 / Debian 12 |
Current AzureLinux 2 kernel |
| 6.6 LTS |
AzureLinux 3 |
Current AzureLinux 3 kernel |
| 6.8 |
Ubuntu 24.04 |
Current AKS Ubuntu kernel |
| 6.10+ |
Ubuntu 24.04 HWE |
First kernel with inet_csk_accept signature change |
| Latest stable |
kernel.org |
Catch upcoming breakage early |
Approach
We could use kind on a host with the target kernel, deploy retina via Helm, and validate that metrics are collected. This pattern can be generalized:
-
GitHub Actions matrix job using VMs or containers with different kernels. Options include:
- cilium/little-vm-helper — lightweight QEMU-based kernel testing (used by Cilium for similar eBPF CI)
- Self-hosted runners with specific OS images
- Azure VMs with HWE kernels (as in
validate-dropreason-azure-vm-6.10.sh)
-
Validation checks per kernel (not just dropreason):
- All eBPF programs load without verifier errors
- Metrics endpoint exposes expected metric families (
networkobservability_drop_count, networkobservability_forward_count, etc.)
- No unexpected errors in agent logs
-
Scheduled + PR-triggered:
- Run the full kernel matrix on a schedule (e.g., nightly or weekly)
- On PRs that touch
pkg/plugin/*/_cprog/, run at least the LTS kernels + latest stable
Stretch goals
- BTF compatibility checks — validate that CO-RE relocations succeed on each target kernel's BTF
- Kernel release tracking — automated alerts when a new stable kernel is tagged that hasn't been tested yet
- Performance regression — compare eBPF program instruction counts across kernels to catch verifier complexity regressions
Context
Problem
Retina's eBPF plugins (dropreason, packetforward, dns) are tightly coupled to kernel internals — function signatures, tracepoint layouts, and BTF availability can change across kernel releases. Issue #1906 demonstrated this:
inet_csk_acceptchanged its signature in Linux 6.10-rc1, silently breaking the dropreason plugin on newer kernels.Today there is no automated CI coverage for kernels beyond what AKS ships (currently 6.6 LTS on AzureLinux 3, 6.8 on Ubuntu 24.04). This means breakage on newer kernels is only discovered manually and after the fact.
Proposal
Add a CI job (or scheduled workflow) that validates eBPF plugin loading and basic metric collection across a matrix of kernel versions. Specifically:
Kernel matrix
At minimum, cover the following kernel families:
inet_csk_acceptsignature changeApproach
We could use kind on a host with the target kernel, deploy retina via Helm, and validate that metrics are collected. This pattern can be generalized:
GitHub Actions matrix job using VMs or containers with different kernels. Options include:
validate-dropreason-azure-vm-6.10.sh)Validation checks per kernel (not just dropreason):
networkobservability_drop_count,networkobservability_forward_count, etc.)Scheduled + PR-triggered:
pkg/plugin/*/_cprog/, run at least the LTS kernels + latest stableStretch goals
Context
inet_csk_acceptverifier failure on 6.10+