Hubble metric identity resolution incomplete for remote (cross-node) pod endpoints

  ### Description

  When running Retina with Hubble metrics enabled on GKE (without Cilium as CNI), the `source` and `destination` labels in Hubble metrics (e.g., `hubble_tcp_flags_total`, `hubble_flows_processed_total`,
  `hubble_dns_queries_total`) are only populated for pods **local to the Retina agent's node**. Remote pods (on other nodes) get `source=""` or `destination=""`, even though their `CiliumIdentity` and
  `CiliumEndpoint` CRDs exist with correct labels.

  The `source_namespace` / `destination_namespace` labels resolve correctly for all pods (local and remote) because they come from the cluster-wide `CiliumIdentity` CRDs. But the app-level identity resolution
  (via `sourceEgressContext=app` or `labelsContext=source_app`) fails for remote endpoints.

  This means **no single Hubble metric series has both `source` and `destination` populated for cross-node traffic**, making per-service dashboards incomplete.

  ### Environment

  - **Retina version:** v1.1.0
  - **Chart:** `oci://ghcr.io/microsoft/retina/charts/retina-hubble` v1.1.0
  - **Kubernetes:** GKE (Google Kubernetes Engine), europe-west3
  - **CNI:** GKE default (not Cilium)
  - **Nodes:** 35+ nodes (GKE NAP autoscaling)

  ### Hubble metrics configuration

  ```yaml
  hubble:
    metrics:
      enabled:
        - "flow:sourceEgressContext=app|pod-name;destinationIngressContext=app|pod-name;labelsContext=source_namespace,destination_namespace,source_app,destination_app"
        - "tcp:sourceEgressContext=app|pod-name;destinationIngressContext=app|pod-name;labelsContext=source_namespace,destination_namespace,source_app,destination_app"
        - "drop:sourceEgressContext=app|pod-name;destinationIngressContext=app|pod-name;labelsContext=source_namespace,destination_namespace,source_app,destination_app"
        - "dns:query;sourceEgressContext=app|pod-name;destinationIngressContext=app|pod-name;labelsContext=source_namespace,destination_namespace,source_app,destination_app"
  ```


  ### Steps to reproduce

  1. Deploy Retina v1.1.0 with Hubble metrics on a multi-node GKE cluster (no Cilium CNI)
  2. Ensure pods have app.kubernetes.io/name labels
  3. Verify CiliumIdentity CRDs contain the correct label:
  kubectl get ciliumidentity <id> -o jsonpath='{.security-labels}' | tr ',' '\n' | grep app # Output: "k8s:app.kubernetes.io/name":"my-service"
  4. Scrape Hubble metrics from a Retina agent:
  curl -s http://<retina-agent-pod-ip>:9965/metrics | grep hubble_tcp_flags
  5. Observe that source is populated for local pods but empty for remote pods, and vice versa for destination.

  ### Actual behavior

  From retina-agent on Node A (where dispatching runs):
  hubble_tcp_flags_total{source="dispatching", source_namespace="consumer-backend", destination="", destination_namespace="core-services", flag="SYN"} 42

  From retina-agent on Node B (where pricing-web runs):
  hubble_tcp_flags_total{source="", source_namespace="consumer-backend", destination="pricing-web", destination_namespace="core-services", flag="SYN"} 42

  Labels that DO resolve for remote pods: source_namespace, destination_namespace
  Labels that DON'T resolve for remote pods: source, destination, source_app, destination_app

  source_workload / destination_workload (via labelsContext) are always empty for both local and remote pods — the workload-name context also never resolves.


  ### Expected behavior

  Both source and destination labels should resolve for all pods in the cluster, since their CiliumIdentity CRDs exist with the correct app.kubernetes.io/name labels. The Retina operator creates these CRDs
  correctly, the issue is in the Hubble observer's identity cache not using them for remote endpoint resolution.

 ### Root cause analysis

  The Retina operator correctly creates CiliumEndpoint and CiliumIdentity CRDs for all pods. However, the Hubble observer inside each Retina agent appears to only maintain a complete IP → endpoint → identity
  mapping for local pods. Remote pod IPs are matched to a CiliumIdentity (giving namespace), but the full identity label lookup (needed for app context resolution) fails.

  In upstream Cilium, the Cilium agent maintains a complete cluster-wide identity cache via the kvstore or CRD-backed identity allocator. Retina's Hubble observer doesn't appear to build an equivalent
  cluster-wide cache from the CiliumEndpoint/CiliumIdentity CRDs it has access to.

  ### Impact

  - Datadog dashboards filtering by service name (source/destination tags) show incomplete data
  - Service dependency tables have empty service name columns for cross-node traffic
  - Per-service TCP error rate calculations are inaccurate (some traffic attributed to source="")
  - Engineers must rely on source_namespace/destination_namespace filtering which is less precise

  ### Workaround

  Filter dashboards by namespace instead of service name. This works but doesn't distinguish between multiple services in the same namespace.

  ### Suggested fix

  Have the Retina agent's Hubble observer build a cluster-wide identity cache from CiliumEndpoint and CiliumIdentity CRDs (which the Retina operator already creates), similar to how upstream Cilium's agent
  populates its identity cache. This would allow sourceEgressContext=app and destinationIngressContext=app to resolve correctly for all pods regardless of which node they run on.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hubble metric identity resolution incomplete for remote (cross-node) pod endpoints #2182

Description

Environment

Hubble metrics configuration

Steps to reproduce

Actual behavior

Expected behavior

Root cause analysis

Impact

Workaround

Suggested fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hubble metric identity resolution incomplete for remote (cross-node) pod endpoints #2182

Description

Description

Environment

Hubble metrics configuration

Steps to reproduce

Actual behavior

Expected behavior

Root cause analysis

Impact

Workaround

Suggested fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions