Skip to content

feat(ha): Add per-tenant configurable failover timeout#7481

Open
yeya24 wants to merge 1 commit intocortexproject:masterfrom
yeya24:ha-failover-timeout-per-tenant
Open

feat(ha): Add per-tenant configurable failover timeout#7481
yeya24 wants to merge 1 commit intocortexproject:masterfrom
yeya24:ha-failover-timeout-per-tenant

Conversation

@yeya24
Copy link
Copy Markdown
Contributor

@yeya24 yeya24 commented May 6, 2026

Add a per-tenant runtime override for the HA tracker failover timeout via the ha_tracker_failover_timeout field in the limits config (flag: -distributor.ha-tracker.failover-timeout-override). When set to a non-zero value for a tenant, it overrides the global -distributor.ha-tracker.failover-timeout.

This allows operators to configure different failover timeouts for different tenants based on their HA setup requirements.

What this PR does:

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]
  • docs/configuration/v1-guarantees.md updated if this PR introduces experimental flags

@yeya24 yeya24 force-pushed the ha-failover-timeout-per-tenant branch 2 times, most recently from c202aa9 to 5ce8f5f Compare May 6, 2026 04:43
@SungJin1212
Copy link
Copy Markdown
Member

Why don't you extend the existing distributor.ha-tracker.failover-timeout flag?

@yeya24
Copy link
Copy Markdown
Contributor Author

yeya24 commented May 7, 2026

Let me rename the new flag to use the same name as distributor.ha-tracker.failover-timeout.

@yeya24 yeya24 force-pushed the ha-failover-timeout-per-tenant branch from 5ce8f5f to 7f44369 Compare May 7, 2026 04:36
@pull-request-size pull-request-size Bot added size/L and removed size/M labels May 7, 2026
@yeya24 yeya24 force-pushed the ha-failover-timeout-per-tenant branch 2 times, most recently from 067b6fb to af59017 Compare May 7, 2026 05:20
@yeya24
Copy link
Copy Markdown
Contributor Author

yeya24 commented May 7, 2026

Updated to reuse the same config name but make it per tenant

Comment thread pkg/ha/ha_tracker.go
Move -distributor.ha-tracker.failover-timeout from HATrackerConfig (global)
to the per-tenant Limits struct. The flag name and default value (30s)
remain the same, but it can now be overridden per-tenant via runtime config:

  overrides:
    "tenant-1":
      ha_tracker_failover_timeout: 60s

Signed-off-by: Ben Ye <benye@amazon.com>
@yeya24 yeya24 force-pushed the ha-failover-timeout-per-tenant branch from af59017 to 5fafd82 Compare May 8, 2026 04:01
Copy link
Copy Markdown
Member

@friedrichg friedrichg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@dosubot dosubot Bot added the lgtm This PR has been approved by a maintainer label May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

component/ha-tracker lgtm This PR has been approved by a maintainer size/L type/feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants