Skip to content

feat: consecutive-window alerts#2472

Open
mrkaye97 wants to merge 12 commits into
hyperdxio:mainfrom
mrkaye97:mk/add-multi-window-lookback-alerts
Open

feat: consecutive-window alerts#2472
mrkaye97 wants to merge 12 commits into
hyperdxio:mainfrom
mrkaye97:mk/add-multi-window-lookback-alerts

Conversation

@mrkaye97

@mrkaye97 mrkaye97 commented Jun 16, 2026

Copy link
Copy Markdown

Summary

Fixes #2469 (comment)

Adding consecutive-window configuration to alerts, so that you can specify a condition like "only fire this alert after some condition is met for N consecutive windows." This helps prevent flaky alerts (and pages), and cut down on alert noise in many cases.

Not too sure on a few things here, so would love some feedback:

  1. My UX sense isn't amazing - I thought that doing this in-line made sense, but it may make sense to put the number of windows on its own line below the original alert config, e.g.
  2. One implementation detail I was wondering about is if we'd want to allow for alerting if e.g. three out of five windows alert, or similar. For instance "fire the alert if we see this log line at least once in three of five consecutive windows." I could see this being useful, but it also adds some overhead, and it feels like it'd be easy to add that feature in a non-breaking way in the future if someone would like to have it, so I figured I'd leave it off for now.
  3. In the current implementation, the alert in the UI shows as "firing" even if no notification has actually sent yet, because the condition has been met, but the threshold is not. Not sure if that's okay, or if I should figure out some way around it!

Screenshots or video

Screenshot from my local of the new alert modal:

Screenshot 2026-06-16 at 10 52 39 AM

Testing Locally

I tested by creating a trivial alert that'd fire on any error log that we see at least once per minute for three minutes, and then sent a curl once per minute to trigger it:

 curl -s -X POST http://localhost:30996/v1/logs \
    -H "Content-Type: application/json" \
    -H "Authorization: << ingestion api key >>" \
    -d '{
      "resourceLogs": [{
        "resource": {
          "attributes": [{"key": "service.name", "value": {"stringValue": "test-service"}}]
        },
        "scopeLogs": [{
          "logRecords": [{
            "timeUnixNano": "'$(date +%s)000000000'",
            "severityText": "ERROR",
            "severityNumber": 17,
            "body": {"stringValue": "something went wrong"}
          }]
        }]  
      }]
    }'

Sent webhooks to webhook.site, and saw logs matching what we'd expect:

  "text": "🚨 Alert for \"test-alert\" - 1 lines found | \n1 lines found, which meets or exceeds the threshold of 1 lines\nTime Range (UTC): [Jun 16 2:52:00 PM - Jun 16 2:53:00 PM)\n\n```\n\"2026-06-16T14:52:00.001000000Z\",\"hdx-oss-dev-api\",\"info\",\"check-alerts started at Tue Jun 16 2026 10:52:00 GMT-0400 (Eastern Daylight Time)\"\n\"2026-06-16T14:52:00.004000000Z\",\"hdx-oss-dev-api\",\"info\",\"Connection established to MongoDB\"\n\"2026-06-16T14:52:00.004000000Z\",\"hdx-oss-dev-api\",\"debug\",\"finished provider initialization\"\n\"2026-06-16T14:52:00.016000000Z\",\"hdx-oss-dev-api\",\"debug\",\"Fetched 1 alert tasks to process\"\n\"2026-06-16T14:52:00.017000000Z\",\"hdx-oss-dev-api\",\"debug\",\"Obtained teams and webhooks for all alertTasks\"\n\n``` | http://localhost:30296/search/6a3162175459c7092c078df3?from=1781621520000&to=1781621580000&isLive=false | ALERT | 1781621520000 | 1781621580000 | 7e645d5a3c56111a86f2cba9d8f46af046943a6f"

And then:

{"text": "✅ Alert for \"test-alert\" - 0 lines found | The alert has been resolved.\nTime Range (UTC): [Jun 16 2:53:00 PM - Jun 16 2:54:00 PM)\n | http://localhost:30296/search/6a3162175459c7092c078df3?from=1781621580000&to=1781621640000&isLive=false | OK | 1781621580000 | 1781621640000 | 7e645d5a3c56111a86f2cba9d8f46af046943a6f"}

How to test on Vercel preview

Preview routes:

Wasn't sure what to put here!

References

Source Issue: #2469 (comment)

@changeset-bot

changeset-bot Bot commented Jun 16, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: a55f73e

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@vercel

vercel Bot commented Jun 16, 2026

Copy link
Copy Markdown

@mrkaye97 is attempting to deploy a commit to the HyperDX Team on Vercel.

A member of the Team first needs to authorize it.

@greptile-apps

greptile-apps Bot commented Jun 16, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds a "consecutive windows" feature to alerts, requiring a threshold condition to be met for N continuous evaluation windows before a notification fires. A new numConsecutiveWindows field is added across the Mongoose model, Zod schemas, API router, and UI components, alongside a fired boolean field on AlertHistory that tracks whether each window actually triggered a notification.

  • Backend: shouldFireBasedOnConsecutiveWindows queries the N-1 most recent AlertHistory records before the current window and fires only when all of them are in ALERT state. sendNotificationIfResolved is guarded by previousHistory?.fired !== false to suppress spurious resolution webhooks when no notification was ever sent.
  • Frontend: Both the saved-search alert modal and the tile alert editor expose a small NumberInput that saves undefined (rather than 1) when the value is 1, preserving backward-compatible default behavior.
  • Tests: Three new test cases cover the N=1 immediate-fire path, the N=3 suppression-until-third-window path, and the no-resolution-when-never-fired path.

Confidence Score: 5/5

The change is safe to merge. The consecutive-window counting logic, fired-field tracking, and resolution-notification guard all behave correctly under normal operating conditions.

All three code paths (single_value, zero-fill, and group-by time-series) correctly set fired=true/false in both branches of the shouldFireBasedOnConsecutiveWindows check, and the resolution guard properly uses fired !== false to suppress spurious OK webhooks when no notification was ever sent. No new defects were found in this review pass.

packages/api/src/tasks/checkAlerts/index.ts — the shouldFireBasedOnConsecutiveWindows function is the most complex new code path and benefits from additional review, particularly around edge cases that arise when the alert-check service experiences downtime.

Important Files Changed

Filename Overview
packages/api/src/tasks/checkAlerts/index.ts Core alert processing logic; adds shouldFireBasedOnConsecutiveWindows and the fired field tracking. The fired=false else-branches are correctly added at all three call sites, and the resolution guard is updated.
packages/api/src/tasks/checkAlerts/tests/checkAlerts.test.ts New multi-window alerting test suite covers N=1 immediate fire, N=3 suppression, and no-spurious-resolution scenarios; missing a streak-reset case.
packages/api/src/models/alertHistory.ts Adds optional fired boolean field to IAlertHistory interface and schema; change is minimal and correct.
packages/api/src/models/alert.ts Adds numConsecutiveWindows optional field with min:1 validation; correct and minimal.
packages/app/src/DBSearchPageAlertModal.tsx Adds numConsecutiveWindows NumberInput with correct onChange that stores undefined instead of 1 for the default case, and correct singular/plural label.
packages/app/src/components/DBEditTimeChartForm/TileAlertEditor.tsx Mirrors the DBSearchPageAlertModal UI change; variable is named alertnumConsecutiveWindows (inconsistent casing) but functional.
packages/common-utils/src/types.ts Adds numConsecutiveWindows to AlertBaseObjectSchema and AlertsPageItemSchema; consistent with the rest of the schema changes.
packages/api/src/utils/zod.ts Adds numConsecutiveWindows validation to the server-side alertSchema; matches model constraints.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Alert check fires] --> B{doesExceedThreshold?}
    B -- No --> C[history.state = OK\nfired = undefined]
    B -- Yes --> D[history.state = ALERT]
    D --> E{shouldFireBased\nOnConsecutiveWindows?}
    E -- numConsecutiveWindows ≤ 1 --> F[return true]
    E -- numConsecutiveWindows > 1 --> G[Query N-1 most recent\nAlertHistory records\ncreatedAt < nowRoundDown]
    G --> H{length == N-1 AND\nall state == ALERT?}
    H -- No --> I[history.fired = false\nNo notification sent]
    H -- Yes --> F
    F --> J[history.fired = true\nSend ALERT notification]
    C --> K{sendNotificationIfResolved}
    I --> K
    J --> K
    K --> L{previousHistory.state == ALERT\nAND fired !== false\nAND currentHistory.state == OK?}
    L -- Yes --> M[Send OK/resolution notification]
    L -- No --> N[Skip resolution]
Loading
%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[Alert check fires] --> B{doesExceedThreshold?}
    B -- No --> C[history.state = OK\nfired = undefined]
    B -- Yes --> D[history.state = ALERT]
    D --> E{shouldFireBased\nOnConsecutiveWindows?}
    E -- numConsecutiveWindows ≤ 1 --> F[return true]
    E -- numConsecutiveWindows > 1 --> G[Query N-1 most recent\nAlertHistory records\ncreatedAt < nowRoundDown]
    G --> H{length == N-1 AND\nall state == ALERT?}
    H -- No --> I[history.fired = false\nNo notification sent]
    H -- Yes --> F
    F --> J[history.fired = true\nSend ALERT notification]
    C --> K{sendNotificationIfResolved}
    I --> K
    J --> K
    K --> L{previousHistory.state == ALERT\nAND fired !== false\nAND currentHistory.state == OK?}
    L -- Yes --> M[Send OK/resolution notification]
    L -- No --> N[Skip resolution]
Loading

Reviews (5): Last reviewed commit: "chore: lint" | Re-trigger Greptile

Comment thread packages/api/src/tasks/checkAlerts/index.ts
Comment thread packages/api/src/tasks/checkAlerts/index.ts Outdated
Comment on lines +1011 to +1022
const alertHistory = await AlertHistory.find({
alert: new mongoose.Types.ObjectId(alert.id),
...groupFilter,
createdAt: { $lt: nowInMinsRoundDown },
})
.sort({ createdAt: -1 })
.limit(numWindowsToLookBack - 1);

return (
alertHistory.length === numWindowsToLookBack - 1 &&
alertHistory.every(h => h.state === AlertState.ALERT)
);

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Consecutive-window check doesn't verify temporal proximity

shouldFireBasedOnConsecutiveWindows queries the N-1 most recent AlertHistory records without filtering by timestamp proximity. If the alert-check service experienced downtime (e.g., hours of outage), old ALERT records from before the gap would still be returned by the sort-and-limit query and would satisfy the every(h => h.state === ALERT) check, causing the current window to fire as if the intervening period had never broken the streak.

Adding a lower-bound timestamp filter — e.g. createdAt: { $gte: new Date(nowInMinsRoundDown.getTime() - (numWindowsToLookBack - 1) * windowSizeInMins * 60_000), $lt: nowInMinsRoundDown } — would ensure only truly consecutive windows are counted.

Fix in Claude Code Fix in Conductor Fix in Cursor Fix in Codex

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems reasonable, would love input from a maintainer on this one, as I'm not too familiar here!

@mrkaye97 mrkaye97 force-pushed the mk/add-multi-window-lookback-alerts branch from 0214704 to 897fa74 Compare June 16, 2026 15:01
@mrkaye97 mrkaye97 marked this pull request as ready for review June 16, 2026 15:03
Comment thread packages/api/src/tasks/checkAlerts/index.ts Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feature request: generalize alert configuration to allow for "fire alert after N occurrences in M consecutive windows"

2 participants