fix(self-healing): prevent rule patch oscillation#241
Conversation
Real-world bug observed 2026-05-21 on production brain: lesson 911130b3 oscillated between two rule phrasings A and B for 5 consecutive rollbacks spanning 20 days, each marked '100% reduction' in the dashboard. Root cause: `observe_patch()` had no cycle detection — every time the compliance scorer flagged the current text as 'failing,' the patcher rewrote it back to the previous text without checking it had just patched away from that text. The 'reduction' metric games itself: the new text trivially shows zero failures because it has zero observations yet. Fix: new module `_oscillation_guard.py` is consulted before each `observe_patch()` call. Detects direct A→B then B→A cycles within a 30-day / 5-patch lookback window. On detection, emits a `rule_patch_cycle_detected` event and aborts the patch instead of recording another fake-reduction row. Conservative scope (only direct cycles, only within a single category, whitespace-normalized comparison) to minimize false-positive risk. The next sibling fix (`recurrence_change → insufficient_data when <3 sessions`) is filed as a separate cloud-side issue [040a09dd]. Tests: 12 new (oscillation_guard) + 166 existing self-healing/patch tests still green. Refs: paperclip issue 1983a5c6
Activates the oscillation guard restored in 41390bf — every successful Brain.patch_rule call now consults observe_patch, which detects A→B→A→B cycles via _oscillation_guard.detect_cycle and aborts the patch if a cycle is in flight. Without this hook the cherry-picked guard files were dead code. Also includes per-rule injection telemetry (second hunk, line 1200+) that was already in flight in the working tree; preserved so cloud sync / dashboard surfaces can attribute applied rules to real sessions.
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 Walkthrough• Fixes self-healing rule-patch oscillation: Prevents A→B→A cycles in rule rewrites within a 30-day/5-patch lookback window (GRA-1325/Paperclip issue 1983a5c6) • New oscillation guard module ( • New patch observation system ( • Integration with • Per-rule injection telemetry: • Comprehensive test coverage: 12 new regression tests covering cycles, category isolation, text normalization, lookback limits, and acceptance telemetry • No breaking changes: All modifications are additive; public API expanded with new export functions WalkthroughThis PR adds telemetry and safety mechanisms for tracking self-healing rule patches. It introduces oscillation-cycle detection to prevent A→B→A patch loops, patch observation and compliance tracking to measure rule improvement, and per-rule injection telemetry in the Brain's rule application flow. ChangesSelf-Healing Rule Patch Telemetry and Oscillation Guard
🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly Related PRs
Suggested Labels
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.22.0)OpenGrep fatal error (exit code 2): �[32m✔�[39m �[1mOpengrep OSS�[0m �[1m Loading rules from local config...�[0m Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Gradata/src/gradata/brain.py`:
- Around line 1203-1230: The except block that catches emission failures in the
loop over applied rules (inside brain.apply_brain_rules where self.emit is
called for each applied_rule) currently uses logger.debug; change it to catch
Exception as e and call logger.warning with a descriptive message and
exc_info=True so failures are visible (e.g., "rule_injected emit failed for
rule_id=%s" with applied_rule.rule_id and include exc_info=True) to surface
telemetry emission errors instead of silently debugging them.
- Around line 763-768: The current bare except around observe_patch(self,
category, old_description, new_description) swallows errors; change it to catch
Exception as e and log the failure (e.g., logger.debug or logger.warning) with
exc_info=True so exceptions are recorded for telemetry; update the except block
within the same try/except surrounding observe_patch in gradata.brain to call
logger.debug("observe_patch failed", exc_info=True) or similar instead of pass.
In `@Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py`:
- Around line 171-190: Replace the bare except that swallows errors after
calling brain.emit with a logged exception: catch Exception as e and call
logger.warning (or the module logger) including a descriptive message, the
category and matched_event_id from cycle for context, and exc_info=True, then
return {} as before; specifically update the try/except around the brain.emit
call in _oscillation_guard.detect_cycle so failures in brain.emit are not
silently dropped.
- Around line 114-120: The except block around brain.query_events currently
swallows errors silently; change it to catch Exception as e and call
logger.warning with a descriptive message and exc_info=True (e.g.,
logger.warning("Failed querying rule_patch_observed events, allowing patch: %s",
e, exc_info=True)) before returning None so failures are recorded; update the
try/except that wraps brain.query_events in _oscillation_guard.py accordingly
(retain the fail-open return None behavior).
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py`:
- Around line 113-129: The except block currently swallows exceptions from
brain.emit("rule_patch_observed", "_patches.observe_patch", ...) and loses
diagnostics; modify it to catch Exception as e and call logger.warning with a
clear message including identifying context (e.g., category, _PATCH_TAG or
applied_at) and exc_info=True so the traceback is recorded, then return {} as
before; update the except clause in the block that wraps brain.emit in
_patches.observe_patch to reference the caught exception variable and call
logger.warning(..., exc_info=True).
- Around line 170-193: The loop currently swallows all errors from brain.emit
with "except Exception: continue", so change that to catch the Exception as a
variable and log it before continuing: in the block around
brain.emit("_patches.resolve_patch_compliance", ...) (and where updated and
updates are appended), replace the bare except with "except Exception as e" and
call logger.warning with a clear message referencing the emission (e.g., "Failed
to emit rule_patch_observed for _patches.resolve_patch_compliance") and pass
exc_info=True, then continue; this preserves behavior but records the exception
for debugging.
- Around line 146-149: The current try/except around
brain.query_events(event_type="rule_patch_observed", limit=200) swallows errors
and returns [] silently; update the except block to log the failure using the
module logger (logger.warning(..., exc_info=True)) before returning an empty
list so failures are visible (keep returning [] to preserve behavior) and ensure
you catch Exception (or a more specific exception type if appropriate) around
the brain.query_events call and reference pending_events in the try block.
- Around line 213-216: The try/except around
brain.query_events(event_type="rule_patch_observed", limit=500) currently
swallows all errors and sets events = [], hiding failures; change the except to
catch Exception as e and call logger.warning with a descriptive message
including the exception (use exc_info=True) before assigning events = [] so
failures are logged for debugging (refer to brain.query_events and the events
variable and use logger.warning(..., exc_info=True)).
- Around line 37-44: Replace the bare except that swallows query errors around
the brain.query_events call with a typed except that logs the failure: change
"except Exception: return 0" to "except Exception as e: logger.warning('Failed
to query RULE_FAILURE events (lookback_sessions=%s): %s', lookback_sessions, e,
exc_info=True); return 0" (or obtain a module logger via logging.getLogger if
none exists). This preserves the return behavior but records the exception with
exc_info=True for debugging.
In `@Gradata/tests/test_oscillation_guard.py`:
- Around line 200-216: Add an assertion in
test_observe_patch_blocks_direct_cycle that verifies the blocked cycle does not
also record a fake "improvement": inspect the _FakeBrain instance to find where
emitted events are recorded (e.g. brain.events or brain.recorded_events) and
assert that the number of events with event_type == "rule_patch_observed" is
exactly 1 after calling observe_patch twice (first A→B, second blocked B→A);
keep this check right after the existing assertions that validate
second["event_type"] and data fields so the test guarantees no extra
"rule_patch_observed" was produced by the cycle detection logic.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6176f09d-0949-4c7f-bab0-e62b1a47e8c4
📒 Files selected for processing (4)
Gradata/src/gradata/brain.pyGradata/src/gradata/enhancements/self_improvement/_oscillation_guard.pyGradata/src/gradata/enhancements/self_improvement/_patches.pyGradata/tests/test_oscillation_guard.py
📜 Review details
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/src/**/*.py: Prefersentence-transformersfor local embeddings,google-genaifor Gemini embeddings,cryptographyfor AES-GCM encrypted system.db,bm25sfor BM25 rule ranking, andmem0aifor external memory adapters — guard all optional dependency imports withtry / except ImportErrorat the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bareexcept: pass— use typed exceptions or at minimumlogger.warning(...)withexc_info=Trueto avoid silent failure in a memory product
Never import from out-of-scope sibling directories../Sprites/or../Hausgem/withingradata/*code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to../Sprites/,../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from insidegradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes
Files:
Gradata/src/gradata/enhancements/self_improvement/_patches.pyGradata/src/gradata/brain.pyGradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py
Gradata/tests/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/tests/**/*.py: SetBRAIN_DIRenvironment variable viatmp_pathin conftest.py for test isolation — ensure_paths.pymodule cache refreshes when callingBrain.init()directly inside tests
Add unit tests intests/test_*.pyfor every CI push without LLM calls (deterministic); mark integration tests with@pytest.mark.integrationand skip them by default (they hit real LLM APIs)
Files:
Gradata/tests/test_oscillation_guard.py
🔇 Additional comments (4)
Gradata/tests/test_oscillation_guard.py (4)
30-84: LGTM!
92-159: LGTM!
167-186: LGTM!
219-231: LGTM!
| try: | ||
| from gradata.enhancements.self_improvement._patches import observe_patch | ||
|
|
||
| observe_patch(self, category, old_description, new_description) | ||
| except Exception: # pragma: no cover — defensive | ||
| pass |
There was a problem hiding this comment.
Silent failure hides patch observation errors.
The except Exception: pass swallows observe_patch() failures without logging, making it impossible to diagnose telemetry issues. While the pragma: no cover indicates defensive intent, as per coding guidelines, use at minimum logger.debug(...) with exc_info=True to aid debugging without raising.
🔍 Proposed fix to add logging
try:
from gradata.enhancements.self_improvement._patches import observe_patch
observe_patch(self, category, old_description, new_description)
- except Exception: # pragma: no cover — defensive
- pass
+ except Exception as exc: # pragma: no cover — defensive
+ logger.debug("observe_patch failed: %s", exc, exc_info=True)As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/brain.py` around lines 763 - 768, The current bare except
around observe_patch(self, category, old_description, new_description) swallows
errors; change it to catch Exception as e and log the failure (e.g.,
logger.debug or logger.warning) with exc_info=True so exceptions are recorded
for telemetry; update the except block within the same try/except surrounding
observe_patch in gradata.brain to call logger.debug("observe_patch failed",
exc_info=True) or similar instead of pass.
| # Persist per-rule injection telemetry so cloud sync / dashboard views | ||
| # can show which specific rules were applied in real sessions. | ||
| if applied: | ||
| scope_payload = { | ||
| "task_type": scope.task_type, | ||
| "domain": scope.domain, | ||
| "audience": scope.audience, | ||
| } | ||
| for applied_rule in applied: | ||
| try: | ||
| self.emit( | ||
| "rule_injected", | ||
| "brain.apply_brain_rules", | ||
| { | ||
| "rule_id": applied_rule.rule_id, | ||
| "category": applied_rule.lesson.category, | ||
| "confidence": applied_rule.lesson.confidence, | ||
| "state": applied_rule.lesson.state.value, | ||
| "task": task, | ||
| "scope": scope_payload, | ||
| }, | ||
| [ | ||
| f"rule:{applied_rule.rule_id}", | ||
| f"category:{applied_rule.lesson.category}", | ||
| ], | ||
| ) | ||
| except Exception as e: | ||
| logger.debug("rule_injected emit failed: %s", e) |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | 💤 Low value
Consider upgrading debug logging to warning for emission failures.
Line 1230 uses logger.debug for emission failures, but as per coding guidelines, telemetry failures should use logger.warning(...) with exc_info=True to avoid silent failure in a memory product. Emission failures are unusual and warrant investigation, making warning more appropriate than debug.
♻️ Proposed change to use warning level
except Exception as e:
- logger.debug("rule_injected emit failed: %s", e)
+ logger.warning("rule_injected emit failed: %s", e, exc_info=True)As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/brain.py` around lines 1203 - 1230, The except block that
catches emission failures in the loop over applied rules (inside
brain.apply_brain_rules where self.emit is called for each applied_rule)
currently uses logger.debug; change it to catch Exception as e and call
logger.warning with a descriptive message and exc_info=True so failures are
visible (e.g., "rule_injected emit failed for rule_id=%s" with
applied_rule.rule_id and include exc_info=True) to surface telemetry emission
errors instead of silently debugging them.
| try: | ||
| events = brain.query_events( | ||
| event_type="rule_patch_observed", | ||
| limit=200, | ||
| ) | ||
| except Exception: | ||
| return None # Fail open — never block patches on a query failure. |
There was a problem hiding this comment.
Silent failure risks losing cycle-detection visibility.
The except Exception: return None swallows query failures without logging, making debugging difficult if event queries start failing in production. As per coding guidelines, avoid silent failures in a memory product — use at minimum logger.warning(...) with exc_info=True.
🔍 Proposed fix to add logging
+import logging
+
+_logger = logging.getLogger("gradata.oscillation_guard")
+
+
def detect_cycle(
brain: Brain,
category: str,
old_description: str,
new_description: str,
lookback_days: int = _DEFAULT_LOOKBACK_DAYS,
lookback_patches: int = _DEFAULT_LOOKBACK_PATCHES,
) -> dict | None:
...
try:
events = brain.query_events(
event_type="rule_patch_observed",
limit=200,
)
except Exception:
+ _logger.warning("Cycle detection query failed; failing open", exc_info=True)
return None # Fail open — never block patches on a query failure.As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| events = brain.query_events( | |
| event_type="rule_patch_observed", | |
| limit=200, | |
| ) | |
| except Exception: | |
| return None # Fail open — never block patches on a query failure. | |
| import logging | |
| _logger = logging.getLogger("gradata.oscillation_guard") | |
| def detect_cycle( | |
| brain: Brain, | |
| category: str, | |
| old_description: str, | |
| new_description: str, | |
| lookback_days: int = _DEFAULT_LOOKBACK_DAYS, | |
| lookback_patches: int = _DEFAULT_LOOKBACK_PATCHES, | |
| ) -> dict | None: | |
| ... | |
| try: | |
| events = brain.query_events( | |
| event_type="rule_patch_observed", | |
| limit=200, | |
| ) | |
| except Exception: | |
| _logger.warning("Cycle detection query failed; failing open", exc_info=True) | |
| return None # Fail open — never block patches on a query failure. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py`
around lines 114 - 120, The except block around brain.query_events currently
swallows errors silently; change it to catch Exception as e and call
logger.warning with a descriptive message and exc_info=True (e.g.,
logger.warning("Failed querying rule_patch_observed events, allowing patch: %s",
e, exc_info=True)) before returning None so failures are recorded; update the
try/except that wraps brain.query_events in _oscillation_guard.py accordingly
(retain the fail-open return None behavior).
| try: | ||
| event = brain.emit( | ||
| "rule_patch_cycle_detected", | ||
| "_oscillation_guard.detect_cycle", | ||
| { | ||
| "category": (category or "").upper(), | ||
| "old_rule_text": (cycle.get("old_text") or "")[:500], | ||
| "new_rule_text": (cycle.get("new_text") or "")[:500], | ||
| "matched_event_id": cycle.get("matched_event_id"), | ||
| "matched_applied_at": cycle.get("matched_applied_at"), | ||
| "cycle_length": cycle.get("cycle_length", 0), | ||
| "lock_sessions": cycle.get("lock_sessions", _DEFAULT_LOCK_SESSIONS), | ||
| "reason": cycle.get("reason", "direct_cycle"), | ||
| "detected_at": datetime.now(UTC).isoformat(), | ||
| }, | ||
| [f"category:{category}", "self_healing", "cycle_detected", "patch_blocked"], | ||
| ) | ||
| return event if isinstance(event, dict) else {} | ||
| except Exception: | ||
| return {} |
There was a problem hiding this comment.
Silent failure hides event emission problems.
The bare except Exception: return {} swallows emission errors without logging, making it hard to diagnose if events stop being recorded. As per coding guidelines, use at minimum logger.warning(...) with exc_info=True to avoid silent failure.
🔍 Proposed fix to add logging
def emit_cycle_detected(brain: Brain, category: str, cycle: dict) -> dict:
"""Emit a `rule_patch_cycle_detected` event so the cycle is auditable.
Returns the emitted event dict, or {} on failure (never raises).
"""
try:
event = brain.emit(
"rule_patch_cycle_detected",
"_oscillation_guard.detect_cycle",
{
"category": (category or "").upper(),
"old_rule_text": (cycle.get("old_text") or "")[:500],
"new_rule_text": (cycle.get("new_text") or "")[:500],
"matched_event_id": cycle.get("matched_event_id"),
"matched_applied_at": cycle.get("matched_applied_at"),
"cycle_length": cycle.get("cycle_length", 0),
"lock_sessions": cycle.get("lock_sessions", _DEFAULT_LOCK_SESSIONS),
"reason": cycle.get("reason", "direct_cycle"),
"detected_at": datetime.now(UTC).isoformat(),
},
[f"category:{category}", "self_healing", "cycle_detected", "patch_blocked"],
)
return event if isinstance(event, dict) else {}
except Exception:
+ _logger.warning("Cycle-detected event emission failed", exc_info=True)
return {}As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_oscillation_guard.py`
around lines 171 - 190, Replace the bare except that swallows errors after
calling brain.emit with a logged exception: catch Exception as e and call
logger.warning (or the module logger) including a descriptive message, the
category and matched_event_id from cycle for context, and exc_info=True, then
return {} as before; specifically update the try/except around the brain.emit
call in _oscillation_guard.detect_cycle so failures in brain.emit are not
silently dropped.
| try: | ||
| events = brain.query_events( | ||
| event_type="RULE_FAILURE", | ||
| last_n_sessions=lookback_sessions, | ||
| limit=500, | ||
| ) | ||
| except Exception: | ||
| return 0 |
There was a problem hiding this comment.
Silent query failure returns zero, masking compliance metrics.
The except Exception: return 0 swallows query errors without logging, causing compliance counts to silently default to zero when the event query fails. This can skew acceptance metrics. As per coding guidelines, use logger.warning(...) with exc_info=True.
🔍 Proposed fix to add logging
+import logging
+
+_logger = logging.getLogger("gradata.patch_telemetry")
+
+
def _count_failures_for_rule(
brain: Brain,
category: str,
rule_description: str,
lookback_sessions: int = _LOOKBACK_SESSIONS,
) -> int:
"""Count RULE_FAILURE events matching this specific rule in the last N sessions."""
try:
events = brain.query_events(
event_type="RULE_FAILURE",
last_n_sessions=lookback_sessions,
limit=500,
)
except Exception:
+ _logger.warning("RULE_FAILURE query failed in _count_failures_for_rule", exc_info=True)
return 0As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| events = brain.query_events( | |
| event_type="RULE_FAILURE", | |
| last_n_sessions=lookback_sessions, | |
| limit=500, | |
| ) | |
| except Exception: | |
| return 0 | |
| try: | |
| events = brain.query_events( | |
| event_type="RULE_FAILURE", | |
| last_n_sessions=lookback_sessions, | |
| limit=500, | |
| ) | |
| except Exception: | |
| _logger.warning("RULE_FAILURE query failed in _count_failures_for_rule", exc_info=True) | |
| return 0 |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
37 - 44, Replace the bare except that swallows query errors around the
brain.query_events call with a typed except that logs the failure: change
"except Exception: return 0" to "except Exception as e: logger.warning('Failed
to query RULE_FAILURE events (lookback_sessions=%s): %s', lookback_sessions, e,
exc_info=True); return 0" (or obtain a module logger via logging.getLogger if
none exists). This preserves the return behavior but records the exception with
exc_info=True for debugging.
| try: | ||
| event = brain.emit( | ||
| "rule_patch_observed", | ||
| "_patches.observe_patch", | ||
| { | ||
| "category": category.upper(), | ||
| "old_rule_text": old_description[:_MAX_RULE_TEXT], | ||
| "new_rule_text": new_description[:_MAX_RULE_TEXT], | ||
| "applied_at": datetime.now(UTC).isoformat(), | ||
| "observed_compliance_before": compliance_before, | ||
| "observed_compliance_after_3_sessions": None, | ||
| }, | ||
| [f"category:{category}", "self_healing", _PATCH_TAG], | ||
| ) | ||
| return event if isinstance(event, dict) else {} | ||
| except Exception: | ||
| return {} |
There was a problem hiding this comment.
Silent emission failure loses patch observations.
The except Exception: return {} swallows event emission errors without logging, making it impossible to diagnose why patch observations aren't being recorded. As per coding guidelines, add logger.warning(...) with exc_info=True.
🔍 Proposed fix to add logging
compliance_before = _count_failures_for_rule(brain, category, old_description)
try:
event = brain.emit(
"rule_patch_observed",
"_patches.observe_patch",
{
"category": category.upper(),
"old_rule_text": old_description[:_MAX_RULE_TEXT],
"new_rule_text": new_description[:_MAX_RULE_TEXT],
"applied_at": datetime.now(UTC).isoformat(),
"observed_compliance_before": compliance_before,
"observed_compliance_after_3_sessions": None,
},
[f"category:{category}", "self_healing", _PATCH_TAG],
)
return event if isinstance(event, dict) else {}
except Exception:
+ _logger.warning("rule_patch_observed emission failed", exc_info=True)
return {}As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| event = brain.emit( | |
| "rule_patch_observed", | |
| "_patches.observe_patch", | |
| { | |
| "category": category.upper(), | |
| "old_rule_text": old_description[:_MAX_RULE_TEXT], | |
| "new_rule_text": new_description[:_MAX_RULE_TEXT], | |
| "applied_at": datetime.now(UTC).isoformat(), | |
| "observed_compliance_before": compliance_before, | |
| "observed_compliance_after_3_sessions": None, | |
| }, | |
| [f"category:{category}", "self_healing", _PATCH_TAG], | |
| ) | |
| return event if isinstance(event, dict) else {} | |
| except Exception: | |
| return {} | |
| try: | |
| event = brain.emit( | |
| "rule_patch_observed", | |
| "_patches.observe_patch", | |
| { | |
| "category": category.upper(), | |
| "old_rule_text": old_description[:_MAX_RULE_TEXT], | |
| "new_rule_text": new_description[:_MAX_RULE_TEXT], | |
| "applied_at": datetime.now(UTC).isoformat(), | |
| "observed_compliance_before": compliance_before, | |
| "observed_compliance_after_3_sessions": None, | |
| }, | |
| [f"category:{category}", "self_healing", _PATCH_TAG], | |
| ) | |
| return event if isinstance(event, dict) else {} | |
| except Exception: | |
| _logger.warning("rule_patch_observed emission failed", exc_info=True) | |
| return {} |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
113 - 129, The except block currently swallows exceptions from
brain.emit("rule_patch_observed", "_patches.observe_patch", ...) and loses
diagnostics; modify it to catch Exception as e and call logger.warning with a
clear message including identifying context (e.g., category, _PATCH_TAG or
applied_at) and exc_info=True so the traceback is recorded, then return {} as
before; update the except clause in the block that wraps brain.emit in
_patches.observe_patch to reference the caught exception variable and call
logger.warning(..., exc_info=True).
| try: | ||
| pending_events = brain.query_events(event_type="rule_patch_observed", limit=200) | ||
| except Exception: | ||
| return [] |
There was a problem hiding this comment.
Silent query failure prevents compliance resolution.
The except Exception: return [] swallows query errors without logging, causing resolution to silently skip all pending observations when the query fails. As per coding guidelines, add logger.warning(...) with exc_info=True.
🔍 Proposed fix to add logging
try:
pending_events = brain.query_events(event_type="rule_patch_observed", limit=200)
except Exception:
+ _logger.warning("rule_patch_observed query failed in resolve_patch_compliance", exc_info=True)
return []As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| pending_events = brain.query_events(event_type="rule_patch_observed", limit=200) | |
| except Exception: | |
| return [] | |
| try: | |
| pending_events = brain.query_events(event_type="rule_patch_observed", limit=200) | |
| except Exception: | |
| _logger.warning("rule_patch_observed query failed in resolve_patch_compliance", exc_info=True) | |
| return [] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
146 - 149, The current try/except around
brain.query_events(event_type="rule_patch_observed", limit=200) swallows errors
and returns [] silently; update the except block to log the failure using the
module logger (logger.warning(..., exc_info=True)) before returning an empty
list so failures are visible (keep returning [] to preserve behavior) and ensure
you catch Exception (or a more specific exception type if appropriate) around
the brain.query_events call and reference pending_events in the try block.
| try: | ||
| updated = brain.emit( | ||
| "rule_patch_observed", | ||
| "_patches.resolve_patch_compliance", | ||
| { | ||
| **data, | ||
| "observed_compliance_after_3_sessions": compliance_after, | ||
| "compliance_improved": improved, | ||
| "resolution_session": current_session, | ||
| "original_event_id": ev.get("id"), | ||
| }, | ||
| [f"category:{category}", "self_healing", _PATCH_TAG, "resolved"], | ||
| ) | ||
| updates.append( | ||
| { | ||
| "category": category, | ||
| "compliance_before": compliance_before, | ||
| "compliance_after": compliance_after, | ||
| "improved": improved, | ||
| "event": updated if isinstance(updated, dict) else {}, | ||
| } | ||
| ) | ||
| except Exception: | ||
| continue |
There was a problem hiding this comment.
Silent emission failures in loop hide resolution errors.
The except Exception: continue on line 192-193 swallows emission errors without logging, making it impossible to diagnose why some patch compliance updates aren't being recorded. As per coding guidelines, add logger.warning(...) with exc_info=True.
🔍 Proposed fix to add logging
try:
updated = brain.emit(
"rule_patch_observed",
"_patches.resolve_patch_compliance",
{
**data,
"observed_compliance_after_3_sessions": compliance_after,
"compliance_improved": improved,
"resolution_session": current_session,
"original_event_id": ev.get("id"),
},
[f"category:{category}", "self_healing", _PATCH_TAG, "resolved"],
)
updates.append(
{
"category": category,
"compliance_before": compliance_before,
"compliance_after": compliance_after,
"improved": improved,
"event": updated if isinstance(updated, dict) else {},
}
)
except Exception:
+ _logger.warning("Resolution emission failed for event %s", ev.get("id"), exc_info=True)
continueAs per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
170 - 193, The loop currently swallows all errors from brain.emit with "except
Exception: continue", so change that to catch the Exception as a variable and
log it before continuing: in the block around
brain.emit("_patches.resolve_patch_compliance", ...) (and where updated and
updates are appended), replace the bare except with "except Exception as e" and
call logger.warning with a clear message referencing the emission (e.g., "Failed
to emit rule_patch_observed for _patches.resolve_patch_compliance") and pass
exc_info=True, then continue; this preserves behavior but records the exception
for debugging.
| try: | ||
| events = brain.query_events(event_type="rule_patch_observed", limit=500) | ||
| except Exception: | ||
| events = [] |
There was a problem hiding this comment.
Silent query failure returns zero metrics, hiding acceptance rate issues.
The except Exception: events = [] swallows query errors without logging, causing acceptance metrics to silently return all zeros when the query fails. This masks real problems with patch tracking. As per coding guidelines, add logger.warning(...) with exc_info=True.
🔍 Proposed fix to add logging
try:
events = brain.query_events(event_type="rule_patch_observed", limit=500)
except Exception:
+ _logger.warning("rule_patch_observed query failed in patch_acceptance_rate", exc_info=True)
events = []As per coding guidelines: Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| try: | |
| events = brain.query_events(event_type="rule_patch_observed", limit=500) | |
| except Exception: | |
| events = [] | |
| try: | |
| events = brain.query_events(event_type="rule_patch_observed", limit=500) | |
| except Exception: | |
| _logger.warning("rule_patch_observed query failed in patch_acceptance_rate", exc_info=True) | |
| events = [] |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_patches.py` around lines
213 - 216, The try/except around
brain.query_events(event_type="rule_patch_observed", limit=500) currently
swallows all errors and sets events = [], hiding failures; change the except to
catch Exception as e and call logger.warning with a descriptive message
including the exception (use exc_info=True) before assigning events = [] so
failures are logged for debugging (refer to brain.query_events and the events
variable and use logger.warning(..., exc_info=True)).
| def test_observe_patch_blocks_direct_cycle() -> None: | ||
| """Real-world bug: lesson 911130b3 oscillated A→B→A→B 5×. | ||
|
|
||
| After this fix, the 2nd patch back to A must emit | ||
| `rule_patch_cycle_detected` instead of `rule_patch_observed`, so no | ||
| fake "100% reduction" gets recorded and the dashboard shows a lock | ||
| badge instead of another rollback entry. | ||
| """ | ||
| brain = _FakeBrain() | ||
| # First patch A → B succeeds normally. | ||
| first = observe_patch(brain, "TONE", "A", "B") | ||
| assert first["event_type"] == "rule_patch_observed" | ||
| # Second patch attempts B → A — must be blocked as a direct cycle. | ||
| second = observe_patch(brain, "TONE", "B", "A") | ||
| assert second["event_type"] == "rule_patch_cycle_detected" | ||
| assert second["data"]["old_rule_text"] == "B" | ||
| assert second["data"]["new_rule_text"] == "A" |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial | ⚡ Quick win
Assert the blocked cycle does not also record a fake "improvement" observation.
This test is the regression guard for the core bug (oscillation recording a fake "100% reduction"). It only checks the returned event type, so a regression where observe_patch emits both rule_patch_cycle_detected and rule_patch_observed on the cycle step would still pass. Add an assertion that exactly one rule_patch_observed exists after the blocked patch.
💚 Proposed assertion to lock in the fix's guarantee
second = observe_patch(brain, "TONE", "B", "A")
assert second["event_type"] == "rule_patch_cycle_detected"
assert second["data"]["old_rule_text"] == "B"
assert second["data"]["new_rule_text"] == "A"
+ # The blocked cycle must NOT record a second "improvement" observation.
+ observed = brain.query_events(event_type="rule_patch_observed")
+ assert len(observed) == 1🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/tests/test_oscillation_guard.py` around lines 200 - 216, Add an
assertion in test_observe_patch_blocks_direct_cycle that verifies the blocked
cycle does not also record a fake "improvement": inspect the _FakeBrain instance
to find where emitted events are recorded (e.g. brain.events or
brain.recorded_events) and assert that the number of events with event_type ==
"rule_patch_observed" is exactly 1 after calling observe_patch twice (first A→B,
second blocked B→A); keep this check right after the existing assertions that
validate second["event_type"] and data fields so the test guarantees no extra
"rule_patch_observed" was produced by the cycle detection logic.
Summary
Fixes GRA-1325 / Paperclip issue
1983a5c6-b216-4650-aff1-11e135a586d8.This isolates the self-healing oscillation fix from the broad/failing research PR #222:
_oscillation_guard.detect_cycle()for direct A→B then B→A rule-patch cycles within a 30-day / 5-patch lookback.observe_patch()now checks the guard before recording another patch observation; on cycle, it emitsrule_patch_cycle_detectedinstead of creating another fake "improvement" row.Brain.patch_rule()to callobserve_patch()after successful rewrites so the guard is active.Verification
python3 -m pytest Gradata/tests/test_oscillation_guard.py -q→12 passed in 0.10spython3 -m pytest Gradata/tests/test_self_healing.py Gradata/tests/test_self_healing_fix.py Gradata/tests/test_bug_fixes.py -q→119 passed in 5.16sNotes
This PR is intentionally scoped to the SDK-side oscillation guard. The cloud/dashboard-side misleading percentage display for insufficient post-patch observations remains separate follow-up work.