Skip to content

RepetitiveToolCallDetector is blind to tools whose result embeds GUIDs/timestamps (e.g. spawn_wisps) #464

@rockfordlhotka

Description

@rockfordlhotka

Summary

AgentLoopRunner.RepetitiveToolCallDetector is meant to catch a model stuck calling the same tool with the same args and getting the same result, and nudge it to try a different approach after Threshold (3) consecutive identical calls. It keys each call on the triple toolName | argsKey | result (AgentLoopRunner.cs, Track(...)). The detector therefore never fires for any tool whose result text varies on every call — even when the call is semantically identical.

The clearest offender is spawn_wisps: its batch summary embeds a fresh batch-{Guid} and a wisp-{Guid} per definition (SpawnWispsExecutor.cs), plus durationMs/stepsCompleted. So two byte-identical wisp dispatches always produce different result strings → the counter resets to 1 every call → the loop-break nudge is never emitted. The same blind spot applies to any tool that returns generated IDs, timestamps, or other per-call entropy.

Why it matters

This is one of the gaps surfaced by the 2026-06-05 runaway investigation (see #462 for the trim-loop fix and #463 for the dispatch-level circuit breaker). The detector is the in-loop guard; with this blind spot it silently no-ops for an important class of tools, leaving only the coarser cross-loop circuit breaker (#463) to catch repetition.

Repro / evidence

  • RepetitiveToolCallDetectorTests.Track_DifferentResult_ResetsCounter documents the intended reset-on-different-result behaviour — which is exactly what GUID-bearing results trigger every time.
  • spawn_wisps result construction: batch-{Guid:N}, wisp-{Guid:N} per definition in SpawnWispsExecutor.

Proposed fix (for discussion)

Make the repetition key robust to per-call entropy. Options, roughly in order of preference:

  1. Normalize the result before hashing — strip/redact GUIDs, ISO timestamps, and \d+ms durations (regex) so cosmetically-different-but-equivalent results compare equal.
  2. Key spawn_wisps (and similar) on args only, not result — for tools where the result is unique by construction, identical args is the right signal.
  3. Stabilize the tool output — have spawn_wisps keep volatile IDs out of the portion the detector sees (less general; doesn't help other tools).

Option 1 is the most general and fixes the whole class. Whichever is chosen, add detector tests covering a GUID/timestamp-bearing result that should still trip the threshold.

Out of scope

The cross-loop runaway guard (process-wide, survives reprompts/scheduled re-fires) is handled separately by the wisp dispatch circuit breaker in #463. This issue is specifically about the in-loop detector's keying.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions