Skip to content

Helm single-binary v2: default query-backend.address (dns:///_grpc._tcp...) is unresolvable by grpc-go → reads hang 30s / HTTP 499 #5229

@nissessenap

Description

@nissessenap

Describe the bug

Chart: pyroscope 2.0.2 · Image: grafana/pyroscope:2.0.2 · Mode: single-binary, architecture.storage.v2: true, architecture.microservices.enabled: false

In single-binary v2 mode, profiles ingest and compact fine, but every profile-data read query hangs ~30s and returns HTTP 499, so flame graphs render empty in Grafana.

Affected RPCs: /querier.v1.QuerierService/SelectMergeStacktraces and /querier.v1.QuerierService/SelectSeries.
Unaffected (return in ~1ms): ProfileTypes, Series, LabelNames, LabelValues, GetProfileStats.

The query-backend component logs nothing during the hang — the request never reaches it.

Root cause

For single-binary v2, operations/pyroscope/helm/pyroscope/templates/deployments-statefulsets.yaml renders:

-query-backend.address=dns:///_grpc._tcp.<fullname>-headless.$(NAMESPACE_FQDN):9095

The query-backend client (pkg/querybackend/client/client.go) hands this straight to grpc.NewClient(address, ...) with the stock grpc-go dns resolver and a service config of waitForReady: true.

grpc-go's dns resolver (v1.81.0) does an A/AAAA lookup on the literal host. It only does SRV for grpclb (_grpclb._tcp.<host>), and EnableSRVLookups is false by default:

// google.golang.org/grpc/internal/resolver/dns/dns_resolver.go
EnableSRVLookups = false
func (d *dnsResolver) lookupSRV(...)  { ... d.resolver.LookupSRV(ctx, "grpclb", "tcp", d.host) }
func (d *dnsResolver) lookupHost(...) { addrs, err := d.resolver.LookupHost(ctx, d.host) }

The host _grpc._tcp.<headless> has an SRV record but no A/AAAA record, so the resolver yields zero endpoints. With waitForReady: true, every call parks until the 30s call timeout → HTTP 499. The metadata RPCs are unaffected because the metastore client uses its own kubernetes:// discovery (pkg/metastore/client), not grpc-go's resolver — which is why -metastore.address=kubernetes:///<headless>:9095 works while the query-backend address does not.

To Reproduce

  1. Deploy the Helm chart in single-binary mode with architecture.storage.v2: true.
  2. Push any profile (ingestion + compaction succeed).
  3. Open a flame graph in Grafana Explore (or call SelectMergeStacktraces).
  4. It hangs ~30s, then returns 499; query-backend logs nothing.

Expected behavior

query-backend.address resolves to the in-process query-backend gRPC endpoint and reads return promptly.

Workaround

Override to a plain headless A record (grpc-go-resolvable), mirroring the working metastore.address form:

pyroscope:
  extraArgs:
    query-backend.address: "dns:///pyroscope-headless.$(NAMESPACE_FQDN):9095"

(The chart skips its broken default when the key is present in extraArgs, so there is no duplicate flag.)

Suggested fix

Drop the _grpc._tcp. SRV prefix from the chart default — use dns:///<fullname>-headless.$(NAMESPACE_FQDN):9095 (plain A records). Note kubernetes:// / dnssrvnoa+ are not options for this address because it is handed directly to grpc-go's resolver, which only understands dns/passthrough unless a custom resolver is registered.

Environment

  • Pyroscope: 2.0.2 (single-binary, v2 storage)
  • Helm chart: pyroscope-2.0.2
  • grpc-go: v1.81.0
  • Kubernetes: GKE

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions