Description
When running Retina with Hubble metrics enabled on GKE (without Cilium as CNI), the source and destination labels in Hubble metrics (e.g., hubble_tcp_flags_total, hubble_flows_processed_total,
hubble_dns_queries_total) are only populated for pods local to the Retina agent's node. Remote pods (on other nodes) get source="" or destination="", even though their CiliumIdentity and
CiliumEndpoint CRDs exist with correct labels.
The source_namespace / destination_namespace labels resolve correctly for all pods (local and remote) because they come from the cluster-wide CiliumIdentity CRDs. But the app-level identity resolution
(via sourceEgressContext=app or labelsContext=source_app) fails for remote endpoints.
This means no single Hubble metric series has both source and destination populated for cross-node traffic, making per-service dashboards incomplete.
Environment
- Retina version: v1.1.0
- Chart:
oci://ghcr.io/microsoft/retina/charts/retina-hubble v1.1.0
- Kubernetes: GKE (Google Kubernetes Engine), europe-west3
- CNI: GKE default (not Cilium)
- Nodes: 35+ nodes (GKE NAP autoscaling)
Hubble metrics configuration
hubble:
metrics:
enabled:
- "flow:sourceEgressContext=app|pod-name;destinationIngressContext=app|pod-name;labelsContext=source_namespace,destination_namespace,source_app,destination_app"
- "tcp:sourceEgressContext=app|pod-name;destinationIngressContext=app|pod-name;labelsContext=source_namespace,destination_namespace,source_app,destination_app"
- "drop:sourceEgressContext=app|pod-name;destinationIngressContext=app|pod-name;labelsContext=source_namespace,destination_namespace,source_app,destination_app"
- "dns:query;sourceEgressContext=app|pod-name;destinationIngressContext=app|pod-name;labelsContext=source_namespace,destination_namespace,source_app,destination_app"
Steps to reproduce
- Deploy Retina v1.1.0 with Hubble metrics on a multi-node GKE cluster (no Cilium CNI)
- Ensure pods have app.kubernetes.io/name labels
- Verify CiliumIdentity CRDs contain the correct label:
kubectl get ciliumidentity -o jsonpath='{.security-labels}' | tr ',' '\n' | grep app # Output: "k8s:app.kubernetes.io/name":"my-service"
- Scrape Hubble metrics from a Retina agent:
curl -s http://:9965/metrics | grep hubble_tcp_flags
- Observe that source is populated for local pods but empty for remote pods, and vice versa for destination.
Actual behavior
From retina-agent on Node A (where dispatching runs):
hubble_tcp_flags_total{source="dispatching", source_namespace="consumer-backend", destination="", destination_namespace="core-services", flag="SYN"} 42
From retina-agent on Node B (where pricing-web runs):
hubble_tcp_flags_total{source="", source_namespace="consumer-backend", destination="pricing-web", destination_namespace="core-services", flag="SYN"} 42
Labels that DO resolve for remote pods: source_namespace, destination_namespace
Labels that DON'T resolve for remote pods: source, destination, source_app, destination_app
source_workload / destination_workload (via labelsContext) are always empty for both local and remote pods — the workload-name context also never resolves.
Expected behavior
Both source and destination labels should resolve for all pods in the cluster, since their CiliumIdentity CRDs exist with the correct app.kubernetes.io/name labels. The Retina operator creates these CRDs
correctly, the issue is in the Hubble observer's identity cache not using them for remote endpoint resolution.
Root cause analysis
The Retina operator correctly creates CiliumEndpoint and CiliumIdentity CRDs for all pods. However, the Hubble observer inside each Retina agent appears to only maintain a complete IP → endpoint → identity
mapping for local pods. Remote pod IPs are matched to a CiliumIdentity (giving namespace), but the full identity label lookup (needed for app context resolution) fails.
In upstream Cilium, the Cilium agent maintains a complete cluster-wide identity cache via the kvstore or CRD-backed identity allocator. Retina's Hubble observer doesn't appear to build an equivalent
cluster-wide cache from the CiliumEndpoint/CiliumIdentity CRDs it has access to.
Impact
- Datadog dashboards filtering by service name (source/destination tags) show incomplete data
- Service dependency tables have empty service name columns for cross-node traffic
- Per-service TCP error rate calculations are inaccurate (some traffic attributed to source="")
- Engineers must rely on source_namespace/destination_namespace filtering which is less precise
Workaround
Filter dashboards by namespace instead of service name. This works but doesn't distinguish between multiple services in the same namespace.
Suggested fix
Have the Retina agent's Hubble observer build a cluster-wide identity cache from CiliumEndpoint and CiliumIdentity CRDs (which the Retina operator already creates), similar to how upstream Cilium's agent
populates its identity cache. This would allow sourceEgressContext=app and destinationIngressContext=app to resolve correctly for all pods regardless of which node they run on.
Description
When running Retina with Hubble metrics enabled on GKE (without Cilium as CNI), the
sourceanddestinationlabels in Hubble metrics (e.g.,hubble_tcp_flags_total,hubble_flows_processed_total,hubble_dns_queries_total) are only populated for pods local to the Retina agent's node. Remote pods (on other nodes) getsource=""ordestination="", even though theirCiliumIdentityandCiliumEndpointCRDs exist with correct labels.The
source_namespace/destination_namespacelabels resolve correctly for all pods (local and remote) because they come from the cluster-wideCiliumIdentityCRDs. But the app-level identity resolution(via
sourceEgressContext=apporlabelsContext=source_app) fails for remote endpoints.This means no single Hubble metric series has both
sourceanddestinationpopulated for cross-node traffic, making per-service dashboards incomplete.Environment
oci://ghcr.io/microsoft/retina/charts/retina-hubblev1.1.0Hubble metrics configuration
Steps to reproduce
kubectl get ciliumidentity -o jsonpath='{.security-labels}' | tr ',' '\n' | grep app # Output: "k8s:app.kubernetes.io/name":"my-service"
curl -s http://:9965/metrics | grep hubble_tcp_flags
Actual behavior
From retina-agent on Node A (where dispatching runs):
hubble_tcp_flags_total{source="dispatching", source_namespace="consumer-backend", destination="", destination_namespace="core-services", flag="SYN"} 42
From retina-agent on Node B (where pricing-web runs):
hubble_tcp_flags_total{source="", source_namespace="consumer-backend", destination="pricing-web", destination_namespace="core-services", flag="SYN"} 42
Labels that DO resolve for remote pods: source_namespace, destination_namespace
Labels that DON'T resolve for remote pods: source, destination, source_app, destination_app
source_workload / destination_workload (via labelsContext) are always empty for both local and remote pods — the workload-name context also never resolves.
Expected behavior
Both source and destination labels should resolve for all pods in the cluster, since their CiliumIdentity CRDs exist with the correct app.kubernetes.io/name labels. The Retina operator creates these CRDs
correctly, the issue is in the Hubble observer's identity cache not using them for remote endpoint resolution.
Root cause analysis
The Retina operator correctly creates CiliumEndpoint and CiliumIdentity CRDs for all pods. However, the Hubble observer inside each Retina agent appears to only maintain a complete IP → endpoint → identity
mapping for local pods. Remote pod IPs are matched to a CiliumIdentity (giving namespace), but the full identity label lookup (needed for app context resolution) fails.
In upstream Cilium, the Cilium agent maintains a complete cluster-wide identity cache via the kvstore or CRD-backed identity allocator. Retina's Hubble observer doesn't appear to build an equivalent
cluster-wide cache from the CiliumEndpoint/CiliumIdentity CRDs it has access to.
Impact
Workaround
Filter dashboards by namespace instead of service name. This works but doesn't distinguish between multiple services in the same namespace.
Suggested fix
Have the Retina agent's Hubble observer build a cluster-wide identity cache from CiliumEndpoint and CiliumIdentity CRDs (which the Retina operator already creates), similar to how upstream Cilium's agent
populates its identity cache. This would allow sourceEgressContext=app and destinationIngressContext=app to resolve correctly for all pods regardless of which node they run on.