Skip to content

fix(self-healing): prevent rule patch oscillation#241

Open
Gradata wants to merge 2 commits into
mainfrom
gra-1325-oscillation-guard
Open

fix(self-healing): prevent rule patch oscillation#241
Gradata wants to merge 2 commits into
mainfrom
gra-1325-oscillation-guard

Conversation

@Gradata
Copy link
Copy Markdown
Owner

@Gradata Gradata commented Jun 2, 2026

Summary

Fixes GRA-1325 / Paperclip issue 1983a5c6-b216-4650-aff1-11e135a586d8.

This isolates the self-healing oscillation fix from the broad/failing research PR #222:

  • Adds _oscillation_guard.detect_cycle() for direct A→B then B→A rule-patch cycles within a 30-day / 5-patch lookback.
  • observe_patch() now checks the guard before recording another patch observation; on cycle, it emits rule_patch_cycle_detected instead of creating another fake "improvement" row.
  • Wires Brain.patch_rule() to call observe_patch() after successful rewrites so the guard is active.
  • Adds regression coverage for direct cycles, category isolation, lookback limits, identity patches, env-configured lock sessions, and resolve/acceptance telemetry.

Verification

  • python3 -m pytest Gradata/tests/test_oscillation_guard.py -q12 passed in 0.10s
  • python3 -m pytest Gradata/tests/test_self_healing.py Gradata/tests/test_self_healing_fix.py Gradata/tests/test_bug_fixes.py -q119 passed in 5.16s

Notes

This PR is intentionally scoped to the SDK-side oscillation guard. The cloud/dashboard-side misleading percentage display for insufficient post-patch observations remains separate follow-up work.

data-engineer and others added 2 commits June 2, 2026 09:07
Real-world bug observed 2026-05-21 on production brain: lesson 911130b3
oscillated between two rule phrasings A and B for 5 consecutive rollbacks
spanning 20 days, each marked '100% reduction' in the dashboard.

Root cause: `observe_patch()` had no cycle detection — every time the
compliance scorer flagged the current text as 'failing,' the patcher
rewrote it back to the previous text without checking it had just patched
away from that text. The 'reduction' metric games itself: the new text
trivially shows zero failures because it has zero observations yet.

Fix: new module `_oscillation_guard.py` is consulted before each
`observe_patch()` call. Detects direct A→B then B→A cycles within a
30-day / 5-patch lookback window. On detection, emits a
`rule_patch_cycle_detected` event and aborts the patch instead of
recording another fake-reduction row. Conservative scope (only direct
cycles, only within a single category, whitespace-normalized comparison)
to minimize false-positive risk.

The next sibling fix (`recurrence_change → insufficient_data when <3
sessions`) is filed as a separate cloud-side issue [040a09dd].

Tests: 12 new (oscillation_guard) + 166 existing self-healing/patch
tests still green.

Refs: paperclip issue 1983a5c6
Activates the oscillation guard restored in 41390bf — every successful
Brain.patch_rule call now consults observe_patch, which detects A→B→A→B
cycles via _oscillation_guard.detect_cycle and aborts the patch if a cycle
is in flight. Without this hook the cherry-picked guard files were dead code.

Also includes per-rule injection telemetry (second hunk, line 1200+) that
was already in flight in the working tree; preserved so cloud sync /
dashboard surfaces can attribute applied rules to real sessions.
Copy link
Copy Markdown

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

📝 Walkthrough

Fixes self-healing rule-patch oscillation: Prevents A→B→A cycles in rule rewrites within a 30-day/5-patch lookback window (GRA-1325/Paperclip issue 1983a5c6)

New oscillation guard module (_oscillation_guard.py): Exports detect_cycle() and emit_cycle_detected() functions with whitespace-normalized, case-insensitive text comparison and configurable lock sessions

New patch observation system (_patches.py): Exports observe_patch(), resolve_patch_compliance(), and patch_acceptance_rate() to track rule-patch outcomes and compliance improvements

Integration with Brain.patch_rule(): Now calls observe_patch() after successful rewrites to actively prevent oscillating patches

Per-rule injection telemetry: Brain.apply_brain_rules() emits rule_injected events for each applied rule with metadata (id, category, confidence, state, task)

Comprehensive test coverage: 12 new regression tests covering cycles, category isolation, text normalization, lookback limits, and acceptance telemetry

No breaking changes: All modifications are additive; public API expanded with new export functions

Walkthrough

This PR adds telemetry and safety mechanisms for tracking self-healing rule patches. It introduces oscillation-cycle detection to prevent A→B→A patch loops, patch observation and compliance tracking to measure rule improvement, and per-rule injection telemetry in the Brain's rule application flow.

Changes

Self-Healing Rule Patch Telemetry and Oscillation Guard

Layer / File(s) Summary
Oscillation Guard: cycle detection and emission
Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py
Normalizes rule text (whitespace-collapsed, lowercased) and detects direct A→B→A oscillations by querying past rule_patch_observed events within a lookback window (30 days or 5 patches by default). detect_cycle returns cycle metadata on match or None if no cycle or query fails; emit_cycle_detected emits a rule_patch_cycle_detected event with truncated texts and cycle details, suppressing all exceptions.
Patch observation, compliance resolution, and acceptance metrics
Gradata/src/gradata/enhancements/self_improvement/_patches.py
observe_patch() checks for oscillation cycles and emits rule_patch_observed with pre-patch compliance, or emits rule_patch_cycle_detected if a cycle is found. resolve_patch_compliance() resolves pending observations by measuring post-patch compliance after a session gap and emitting updated events. patch_acceptance_rate() aggregates resolved observations to compute acceptance totals and rate (rounded to 3 decimals).
Brain patch and rule-injection telemetry hooks
Gradata/src/gradata/brain.py
patch_rule() calls observe_patch() after emitting RULE_PATCHED, suppressing failures. apply_brain_rules() emits a rule_injected event for each applied rule with metadata (id, category, confidence, state, task, scope), catching and logging emission errors.
Oscillation guard and patch observation test suite
Gradata/tests/test_oscillation_guard.py
Comprehensive tests for detect_cycle (empty history, unrelated patches, direct cycles, self-identity, category isolation, text normalization, lookback window), emit_cycle_detected event emission and persistence, and observe_patch integration (normal patch emission, cycle blocking, unrelated patch sequences). Includes _FakeBrain test double and _patch_event helper.

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly Related PRs

  • Gradata/gradata#86: Both PRs modify Brain.apply_brain_rules() to emit rule-related telemetry events.
  • Gradata/gradata#77: Auto-heal flow triggers brain.patch_rule() for RULE_FAILURE events, which now feeds the new patch-observation and cycle-detection pipeline.
  • Gradata/gradata#42: Both PRs modify Brain.patch_rule() implementation in Gradata/src/gradata/brain.py.

Suggested Labels

bug

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: preventing rule patch oscillation by implementing a guard mechanism to detect and block A→B→A cycles.
Description check ✅ Passed The description is directly related to the changeset, explaining the oscillation fix, key additions (oscillation guard, observe_patch integration), testing coverage, and scope limitations.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gra-1325-oscillation-guard

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.22.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.42][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the bug Something isn't working label Jun 2, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/brain.py`:
- Around line 1203-1230: The except block that catches emission failures in the
loop over applied rules (inside brain.apply_brain_rules where self.emit is
called for each applied_rule) currently uses logger.debug; change it to catch
Exception as e and call logger.warning with a descriptive message and
exc_info=True so failures are visible (e.g., "rule_injected emit failed for
rule_id=%s" with applied_rule.rule_id and include exc_info=True) to surface
telemetry emission errors instead of silently debugging them.
- Around line 763-768: The current bare except around observe_patch(self,
category, old_description, new_description) swallows errors; change it to catch
Exception as e and log the failure (e.g., logger.debug or logger.warning) with
exc_info=True so exceptions are recorded for telemetry; update the except block
within the same try/except surrounding observe_patch in gradata.brain to call
logger.debug("observe_patch failed", exc_info=True) or similar instead of pass.

In `@Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py`:
- Around line 171-190: Replace the bare except that swallows errors after
calling brain.emit with a logged exception: catch Exception as e and call
logger.warning (or the module logger) including a descriptive message, the
category and matched_event_id from cycle for context, and exc_info=True, then
return {} as before; specifically update the try/except around the brain.emit
call in _oscillation_guard.detect_cycle so failures in brain.emit are not
silently dropped.
- Around line 114-120: The except block around brain.query_events currently
swallows errors silently; change it to catch Exception as e and call
logger.warning with a descriptive message and exc_info=True (e.g.,
logger.warning("Failed querying rule_patch_observed events, allowing patch: %s",
e, exc_info=True)) before returning None so failures are recorded; update the
try/except that wraps brain.query_events in _oscillation_guard.py accordingly
(retain the fail-open return None behavior).

In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py`:
- Around line 113-129: The except block currently swallows exceptions from
brain.emit("rule_patch_observed", "_patches.observe_patch", ...) and loses
diagnostics; modify it to catch Exception as e and call logger.warning with a
clear message including identifying context (e.g., category, _PATCH_TAG or
applied_at) and exc_info=True so the traceback is recorded, then return {} as
before; update the except clause in the block that wraps brain.emit in
_patches.observe_patch to reference the caught exception variable and call
logger.warning(..., exc_info=True).
- Around line 170-193: The loop currently swallows all errors from brain.emit
with "except Exception: continue", so change that to catch the Exception as a
variable and log it before continuing: in the block around
brain.emit("_patches.resolve_patch_compliance", ...) (and where updated and
updates are appended), replace the bare except with "except Exception as e" and
call logger.warning with a clear message referencing the emission (e.g., "Failed
to emit rule_patch_observed for _patches.resolve_patch_compliance") and pass
exc_info=True, then continue; this preserves behavior but records the exception
for debugging.
- Around line 146-149: The current try/except around
brain.query_events(event_type="rule_patch_observed", limit=200) swallows errors
and returns [] silently; update the except block to log the failure using the
module logger (logger.warning(..., exc_info=True)) before returning an empty
list so failures are visible (keep returning [] to preserve behavior) and ensure
you catch Exception (or a more specific exception type if appropriate) around
the brain.query_events call and reference pending_events in the try block.
- Around line 213-216: The try/except around
brain.query_events(event_type="rule_patch_observed", limit=500) currently
swallows all errors and sets events = [], hiding failures; change the except to
catch Exception as e and call logger.warning with a descriptive message
including the exception (use exc_info=True) before assigning events = [] so
failures are logged for debugging (refer to brain.query_events and the events
variable and use logger.warning(..., exc_info=True)).
- Around line 37-44: Replace the bare except that swallows query errors around
the brain.query_events call with a typed except that logs the failure: change
"except Exception: return 0" to "except Exception as e: logger.warning('Failed
to query RULE_FAILURE events (lookback_sessions=%s): %s', lookback_sessions, e,
exc_info=True); return 0" (or obtain a module logger via logging.getLogger if
none exists). This preserves the return behavior but records the exception with
exc_info=True for debugging.

In `@Gradata/tests/test_oscillation_guard.py`:
- Around line 200-216: Add an assertion in
test_observe_patch_blocks_direct_cycle that verifies the blocked cycle does not
also record a fake "improvement": inspect the _FakeBrain instance to find where
emitted events are recorded (e.g. brain.events or brain.recorded_events) and
assert that the number of events with event_type == "rule_patch_observed" is
exactly 1 after calling observe_patch twice (first A→B, second blocked B→A);
keep this check right after the existing assertions that validate
second["event_type"] and data fields so the test guarantees no extra
"rule_patch_observed" was produced by the cycle detection logic.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6176f09d-0949-4c7f-bab0-e62b1a47e8c4

📥 Commits

Reviewing files that changed from the base of the PR and between ffcba5b and 4cf785a.

📒 Files selected for processing (4)
  • Gradata/src/gradata/brain.py
  • Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py
  • Gradata/src/gradata/enhancements/self_improvement/_patches.py
  • Gradata/tests/test_oscillation_guard.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/enhancements/self_improvement/_patches.py
  • Gradata/src/gradata/brain.py
  • Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/test_oscillation_guard.py
🔇 Additional comments (4)
Gradata/tests/test_oscillation_guard.py (4)

30-84: LGTM!


92-159: LGTM!


167-186: LGTM!


219-231: LGTM!

Comment on lines +763 to +768
try:
from gradata.enhancements.self_improvement._patches import observe_patch

observe_patch(self, category, old_description, new_description)
except Exception: # pragma: no cover — defensive
pass
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent failure hides patch observation errors.

The except Exception: pass swallows observe_patch() failures without logging, making it impossible to diagnose telemetry issues. While the pragma: no cover indicates defensive intent, as per coding guidelines, use at minimum logger.debug(...) with exc_info=True to aid debugging without raising.

🔍 Proposed fix to add logging
         try:
             from gradata.enhancements.self_improvement._patches import observe_patch

             observe_patch(self, category, old_description, new_description)
-        except Exception:  # pragma: no cover — defensive
-            pass
+        except Exception as exc:  # pragma: no cover — defensive
+            logger.debug("observe_patch failed: %s", exc, exc_info=True)

As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/brain.py` around lines 763 - 768, The current bare except
around observe_patch(self, category, old_description, new_description) swallows
errors; change it to catch Exception as e and log the failure (e.g.,
logger.debug or logger.warning) with exc_info=True so exceptions are recorded
for telemetry; update the except block within the same try/except surrounding
observe_patch in gradata.brain to call logger.debug("observe_patch failed",
exc_info=True) or similar instead of pass.

Comment on lines +1203 to +1230
# Persist per-rule injection telemetry so cloud sync / dashboard views
# can show which specific rules were applied in real sessions.
if applied:
scope_payload = {
"task_type": scope.task_type,
"domain": scope.domain,
"audience": scope.audience,
}
for applied_rule in applied:
try:
self.emit(
"rule_injected",
"brain.apply_brain_rules",
{
"rule_id": applied_rule.rule_id,
"category": applied_rule.lesson.category,
"confidence": applied_rule.lesson.confidence,
"state": applied_rule.lesson.state.value,
"task": task,
"scope": scope_payload,
},
[
f"rule:{applied_rule.rule_id}",
f"category:{applied_rule.lesson.category}",
],
)
except Exception as e:
logger.debug("rule_injected emit failed: %s", e)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | 💤 Low value

Consider upgrading debug logging to warning for emission failures.

Line 1230 uses logger.debug for emission failures, but as per coding guidelines, telemetry failures should use logger.warning(...) with exc_info=True to avoid silent failure in a memory product. Emission failures are unusual and warrant investigation, making warning more appropriate than debug.

♻️ Proposed change to use warning level
                 except Exception as e:
-                    logger.debug("rule_injected emit failed: %s", e)
+                    logger.warning("rule_injected emit failed: %s", e, exc_info=True)

As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/brain.py` around lines 1203 - 1230, The except block that
catches emission failures in the loop over applied rules (inside
brain.apply_brain_rules where self.emit is called for each applied_rule)
currently uses logger.debug; change it to catch Exception as e and call
logger.warning with a descriptive message and exc_info=True so failures are
visible (e.g., "rule_injected emit failed for rule_id=%s" with
applied_rule.rule_id and include exc_info=True) to surface telemetry emission
errors instead of silently debugging them.

Comment on lines +114 to +120
try:
events = brain.query_events(
event_type="rule_patch_observed",
limit=200,
)
except Exception:
return None # Fail open — never block patches on a query failure.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent failure risks losing cycle-detection visibility.

The except Exception: return None swallows query failures without logging, making debugging difficult if event queries start failing in production. As per coding guidelines, avoid silent failures in a memory product — use at minimum logger.warning(...) with exc_info=True.

🔍 Proposed fix to add logging
+import logging
+
+_logger = logging.getLogger("gradata.oscillation_guard")
+
+
 def detect_cycle(
     brain: Brain,
     category: str,
     old_description: str,
     new_description: str,
     lookback_days: int = _DEFAULT_LOOKBACK_DAYS,
     lookback_patches: int = _DEFAULT_LOOKBACK_PATCHES,
 ) -> dict | None:
     ...
     try:
         events = brain.query_events(
             event_type="rule_patch_observed",
             limit=200,
         )
     except Exception:
+        _logger.warning("Cycle detection query failed; failing open", exc_info=True)
         return None  # Fail open — never block patches on a query failure.

As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
events = brain.query_events(
event_type="rule_patch_observed",
limit=200,
)
except Exception:
return None # Fail open — never block patches on a query failure.
import logging
_logger = logging.getLogger("gradata.oscillation_guard")
def detect_cycle(
brain: Brain,
category: str,
old_description: str,
new_description: str,
lookback_days: int = _DEFAULT_LOOKBACK_DAYS,
lookback_patches: int = _DEFAULT_LOOKBACK_PATCHES,
) -> dict | None:
...
try:
events = brain.query_events(
event_type="rule_patch_observed",
limit=200,
)
except Exception:
_logger.warning("Cycle detection query failed; failing open", exc_info=True)
return None # Fail open — never block patches on a query failure.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py`
around lines 114 - 120, The except block around brain.query_events currently
swallows errors silently; change it to catch Exception as e and call
logger.warning with a descriptive message and exc_info=True (e.g.,
logger.warning("Failed querying rule_patch_observed events, allowing patch: %s",
e, exc_info=True)) before returning None so failures are recorded; update the
try/except that wraps brain.query_events in _oscillation_guard.py accordingly
(retain the fail-open return None behavior).

Comment on lines +171 to +190
try:
event = brain.emit(
"rule_patch_cycle_detected",
"_oscillation_guard.detect_cycle",
{
"category": (category or "").upper(),
"old_rule_text": (cycle.get("old_text") or "")[:500],
"new_rule_text": (cycle.get("new_text") or "")[:500],
"matched_event_id": cycle.get("matched_event_id"),
"matched_applied_at": cycle.get("matched_applied_at"),
"cycle_length": cycle.get("cycle_length", 0),
"lock_sessions": cycle.get("lock_sessions", _DEFAULT_LOCK_SESSIONS),
"reason": cycle.get("reason", "direct_cycle"),
"detected_at": datetime.now(UTC).isoformat(),
},
[f"category:{category}", "self_healing", "cycle_detected", "patch_blocked"],
)
return event if isinstance(event, dict) else {}
except Exception:
return {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent failure hides event emission problems.

The bare except Exception: return {} swallows emission errors without logging, making it hard to diagnose if events stop being recorded. As per coding guidelines, use at minimum logger.warning(...) with exc_info=True to avoid silent failure.

🔍 Proposed fix to add logging
 def emit_cycle_detected(brain: Brain, category: str, cycle: dict) -> dict:
     """Emit a `rule_patch_cycle_detected` event so the cycle is auditable.

     Returns the emitted event dict, or {} on failure (never raises).
     """
     try:
         event = brain.emit(
             "rule_patch_cycle_detected",
             "_oscillation_guard.detect_cycle",
             {
                 "category": (category or "").upper(),
                 "old_rule_text": (cycle.get("old_text") or "")[:500],
                 "new_rule_text": (cycle.get("new_text") or "")[:500],
                 "matched_event_id": cycle.get("matched_event_id"),
                 "matched_applied_at": cycle.get("matched_applied_at"),
                 "cycle_length": cycle.get("cycle_length", 0),
                 "lock_sessions": cycle.get("lock_sessions", _DEFAULT_LOCK_SESSIONS),
                 "reason": cycle.get("reason", "direct_cycle"),
                 "detected_at": datetime.now(UTC).isoformat(),
             },
             [f"category:{category}", "self_healing", "cycle_detected", "patch_blocked"],
         )
         return event if isinstance(event, dict) else {}
     except Exception:
+        _logger.warning("Cycle-detected event emission failed", exc_info=True)
         return {}

As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py`
around lines 171 - 190, Replace the bare except that swallows errors after
calling brain.emit with a logged exception: catch Exception as e and call
logger.warning (or the module logger) including a descriptive message, the
category and matched_event_id from cycle for context, and exc_info=True, then
return {} as before; specifically update the try/except around the brain.emit
call in _oscillation_guard.detect_cycle so failures in brain.emit are not
silently dropped.

Comment on lines +37 to +44
try:
events = brain.query_events(
event_type="RULE_FAILURE",
last_n_sessions=lookback_sessions,
limit=500,
)
except Exception:
return 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent query failure returns zero, masking compliance metrics.

The except Exception: return 0 swallows query errors without logging, causing compliance counts to silently default to zero when the event query fails. This can skew acceptance metrics. As per coding guidelines, use logger.warning(...) with exc_info=True.

🔍 Proposed fix to add logging
+import logging
+
+_logger = logging.getLogger("gradata.patch_telemetry")
+
+
 def _count_failures_for_rule(
     brain: Brain,
     category: str,
     rule_description: str,
     lookback_sessions: int = _LOOKBACK_SESSIONS,
 ) -> int:
     """Count RULE_FAILURE events matching this specific rule in the last N sessions."""
     try:
         events = brain.query_events(
             event_type="RULE_FAILURE",
             last_n_sessions=lookback_sessions,
             limit=500,
         )
     except Exception:
+        _logger.warning("RULE_FAILURE query failed in _count_failures_for_rule", exc_info=True)
         return 0

As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
events = brain.query_events(
event_type="RULE_FAILURE",
last_n_sessions=lookback_sessions,
limit=500,
)
except Exception:
return 0
try:
events = brain.query_events(
event_type="RULE_FAILURE",
last_n_sessions=lookback_sessions,
limit=500,
)
except Exception:
_logger.warning("RULE_FAILURE query failed in _count_failures_for_rule", exc_info=True)
return 0
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
37 - 44, Replace the bare except that swallows query errors around the
brain.query_events call with a typed except that logs the failure: change
"except Exception: return 0" to "except Exception as e: logger.warning('Failed
to query RULE_FAILURE events (lookback_sessions=%s): %s', lookback_sessions, e,
exc_info=True); return 0" (or obtain a module logger via logging.getLogger if
none exists). This preserves the return behavior but records the exception with
exc_info=True for debugging.

Comment on lines +113 to +129
try:
event = brain.emit(
"rule_patch_observed",
"_patches.observe_patch",
{
"category": category.upper(),
"old_rule_text": old_description[:_MAX_RULE_TEXT],
"new_rule_text": new_description[:_MAX_RULE_TEXT],
"applied_at": datetime.now(UTC).isoformat(),
"observed_compliance_before": compliance_before,
"observed_compliance_after_3_sessions": None,
},
[f"category:{category}", "self_healing", _PATCH_TAG],
)
return event if isinstance(event, dict) else {}
except Exception:
return {}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent emission failure loses patch observations.

The except Exception: return {} swallows event emission errors without logging, making it impossible to diagnose why patch observations aren't being recorded. As per coding guidelines, add logger.warning(...) with exc_info=True.

🔍 Proposed fix to add logging
     compliance_before = _count_failures_for_rule(brain, category, old_description)

     try:
         event = brain.emit(
             "rule_patch_observed",
             "_patches.observe_patch",
             {
                 "category": category.upper(),
                 "old_rule_text": old_description[:_MAX_RULE_TEXT],
                 "new_rule_text": new_description[:_MAX_RULE_TEXT],
                 "applied_at": datetime.now(UTC).isoformat(),
                 "observed_compliance_before": compliance_before,
                 "observed_compliance_after_3_sessions": None,
             },
             [f"category:{category}", "self_healing", _PATCH_TAG],
         )
         return event if isinstance(event, dict) else {}
     except Exception:
+        _logger.warning("rule_patch_observed emission failed", exc_info=True)
         return {}

As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
event = brain.emit(
"rule_patch_observed",
"_patches.observe_patch",
{
"category": category.upper(),
"old_rule_text": old_description[:_MAX_RULE_TEXT],
"new_rule_text": new_description[:_MAX_RULE_TEXT],
"applied_at": datetime.now(UTC).isoformat(),
"observed_compliance_before": compliance_before,
"observed_compliance_after_3_sessions": None,
},
[f"category:{category}", "self_healing", _PATCH_TAG],
)
return event if isinstance(event, dict) else {}
except Exception:
return {}
try:
event = brain.emit(
"rule_patch_observed",
"_patches.observe_patch",
{
"category": category.upper(),
"old_rule_text": old_description[:_MAX_RULE_TEXT],
"new_rule_text": new_description[:_MAX_RULE_TEXT],
"applied_at": datetime.now(UTC).isoformat(),
"observed_compliance_before": compliance_before,
"observed_compliance_after_3_sessions": None,
},
[f"category:{category}", "self_healing", _PATCH_TAG],
)
return event if isinstance(event, dict) else {}
except Exception:
_logger.warning("rule_patch_observed emission failed", exc_info=True)
return {}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
113 - 129, The except block currently swallows exceptions from
brain.emit("rule_patch_observed", "_patches.observe_patch", ...) and loses
diagnostics; modify it to catch Exception as e and call logger.warning with a
clear message including identifying context (e.g., category, _PATCH_TAG or
applied_at) and exc_info=True so the traceback is recorded, then return {} as
before; update the except clause in the block that wraps brain.emit in
_patches.observe_patch to reference the caught exception variable and call
logger.warning(..., exc_info=True).

Comment on lines +146 to +149
try:
pending_events = brain.query_events(event_type="rule_patch_observed", limit=200)
except Exception:
return []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent query failure prevents compliance resolution.

The except Exception: return [] swallows query errors without logging, causing resolution to silently skip all pending observations when the query fails. As per coding guidelines, add logger.warning(...) with exc_info=True.

🔍 Proposed fix to add logging
     try:
         pending_events = brain.query_events(event_type="rule_patch_observed", limit=200)
     except Exception:
+        _logger.warning("rule_patch_observed query failed in resolve_patch_compliance", exc_info=True)
         return []

As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
pending_events = brain.query_events(event_type="rule_patch_observed", limit=200)
except Exception:
return []
try:
pending_events = brain.query_events(event_type="rule_patch_observed", limit=200)
except Exception:
_logger.warning("rule_patch_observed query failed in resolve_patch_compliance", exc_info=True)
return []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
146 - 149, The current try/except around
brain.query_events(event_type="rule_patch_observed", limit=200) swallows errors
and returns [] silently; update the except block to log the failure using the
module logger (logger.warning(..., exc_info=True)) before returning an empty
list so failures are visible (keep returning [] to preserve behavior) and ensure
you catch Exception (or a more specific exception type if appropriate) around
the brain.query_events call and reference pending_events in the try block.

Comment on lines +170 to +193
try:
updated = brain.emit(
"rule_patch_observed",
"_patches.resolve_patch_compliance",
{
**data,
"observed_compliance_after_3_sessions": compliance_after,
"compliance_improved": improved,
"resolution_session": current_session,
"original_event_id": ev.get("id"),
},
[f"category:{category}", "self_healing", _PATCH_TAG, "resolved"],
)
updates.append(
{
"category": category,
"compliance_before": compliance_before,
"compliance_after": compliance_after,
"improved": improved,
"event": updated if isinstance(updated, dict) else {},
}
)
except Exception:
continue
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent emission failures in loop hide resolution errors.

The except Exception: continue on line 192-193 swallows emission errors without logging, making it impossible to diagnose why some patch compliance updates aren't being recorded. As per coding guidelines, add logger.warning(...) with exc_info=True.

🔍 Proposed fix to add logging
         try:
             updated = brain.emit(
                 "rule_patch_observed",
                 "_patches.resolve_patch_compliance",
                 {
                     **data,
                     "observed_compliance_after_3_sessions": compliance_after,
                     "compliance_improved": improved,
                     "resolution_session": current_session,
                     "original_event_id": ev.get("id"),
                 },
                 [f"category:{category}", "self_healing", _PATCH_TAG, "resolved"],
             )
             updates.append(
                 {
                     "category": category,
                     "compliance_before": compliance_before,
                     "compliance_after": compliance_after,
                     "improved": improved,
                     "event": updated if isinstance(updated, dict) else {},
                 }
             )
         except Exception:
+            _logger.warning("Resolution emission failed for event %s", ev.get("id"), exc_info=True)
             continue

As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
170 - 193, The loop currently swallows all errors from brain.emit with "except
Exception: continue", so change that to catch the Exception as a variable and
log it before continuing: in the block around
brain.emit("_patches.resolve_patch_compliance", ...) (and where updated and
updates are appended), replace the bare except with "except Exception as e" and
call logger.warning with a clear message referencing the emission (e.g., "Failed
to emit rule_patch_observed for _patches.resolve_patch_compliance") and pass
exc_info=True, then continue; this preserves behavior but records the exception
for debugging.

Comment on lines +213 to +216
try:
events = brain.query_events(event_type="rule_patch_observed", limit=500)
except Exception:
events = []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent query failure returns zero metrics, hiding acceptance rate issues.

The except Exception: events = [] swallows query errors without logging, causing acceptance metrics to silently return all zeros when the query fails. This masks real problems with patch tracking. As per coding guidelines, add logger.warning(...) with exc_info=True.

🔍 Proposed fix to add logging
     try:
         events = brain.query_events(event_type="rule_patch_observed", limit=500)
     except Exception:
+        _logger.warning("rule_patch_observed query failed in patch_acceptance_rate", exc_info=True)
         events = []

As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
events = brain.query_events(event_type="rule_patch_observed", limit=500)
except Exception:
events = []
try:
events = brain.query_events(event_type="rule_patch_observed", limit=500)
except Exception:
_logger.warning("rule_patch_observed query failed in patch_acceptance_rate", exc_info=True)
events = []
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
213 - 216, The try/except around
brain.query_events(event_type="rule_patch_observed", limit=500) currently
swallows all errors and sets events = [], hiding failures; change the except to
catch Exception as e and call logger.warning with a descriptive message
including the exception (use exc_info=True) before assigning events = [] so
failures are logged for debugging (refer to brain.query_events and the events
variable and use logger.warning(..., exc_info=True)).

Comment on lines +200 to +216
def test_observe_patch_blocks_direct_cycle() -> None:
"""Real-world bug: lesson 911130b3 oscillated A→B→A→B 5×.

After this fix, the 2nd patch back to A must emit
`rule_patch_cycle_detected` instead of `rule_patch_observed`, so no
fake "100% reduction" gets recorded and the dashboard shows a lock
badge instead of another rollback entry.
"""
brain = _FakeBrain()
# First patch A → B succeeds normally.
first = observe_patch(brain, "TONE", "A", "B")
assert first["event_type"] == "rule_patch_observed"
# Second patch attempts B → A — must be blocked as a direct cycle.
second = observe_patch(brain, "TONE", "B", "A")
assert second["event_type"] == "rule_patch_cycle_detected"
assert second["data"]["old_rule_text"] == "B"
assert second["data"]["new_rule_text"] == "A"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial | ⚡ Quick win

Assert the blocked cycle does not also record a fake "improvement" observation.

This test is the regression guard for the core bug (oscillation recording a fake "100% reduction"). It only checks the returned event type, so a regression where observe_patch emits both rule_patch_cycle_detected and rule_patch_observed on the cycle step would still pass. Add an assertion that exactly one rule_patch_observed exists after the blocked patch.

💚 Proposed assertion to lock in the fix's guarantee
     second = observe_patch(brain, "TONE", "B", "A")
     assert second["event_type"] == "rule_patch_cycle_detected"
     assert second["data"]["old_rule_text"] == "B"
     assert second["data"]["new_rule_text"] == "A"
+    # The blocked cycle must NOT record a second "improvement" observation.
+    observed = brain.query_events(event_type="rule_patch_observed")
+    assert len(observed) == 1
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Gradata/tests/test_oscillation_guard.py` around lines 200 - 216, Add an
assertion in test_observe_patch_blocks_direct_cycle that verifies the blocked
cycle does not also record a fake "improvement": inspect the _FakeBrain instance
to find where emitted events are recorded (e.g. brain.events or
brain.recorded_events) and assert that the number of events with event_type ==
"rule_patch_observed" is exactly 1 after calling observe_patch twice (first A→B,
second blocked B→A); keep this check right after the existing assertions that
validate second["event_type"] and data fields so the test guarantees no extra
"rule_patch_observed" was produced by the cycle detection logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants