feat(cli): human-in-the-loop guard for dangerous operations (POC)#159
feat(cli): human-in-the-loop guard for dangerous operations (POC)#159tonychang04 wants to merge 14 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a human-in-the-loop Commander preAction guard: rule-based risk assessment, optional live SQL inspection, brief construction (rule + agent text), local approval UI, append-only audit logging, CLI flags for agent brief and destructive-flag escalation, and async parsing integration. ChangesHuman-in-the-Loop CLI Guard
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Adds a Commander preAction stage that stops destructive operations for
human approval before they run. Because it lives in the CLI itself, it
protects every caller automatically — Claude Code, Cursor, scripts, CI,
and humans all hit the same gate.
- risk-registry: declarative risk per command path + real SQL inspection
(DROP/TRUNCATE/unfiltered DELETE-UPDATE/ALTER DROP/RLS) for `db query`
- brief: claude -p generated explanation with deterministic fallback
- approval-server: localhost page (what happens/blast radius/risks/intent/
recommendation) that blocks until Approve/Deny; fail-closed
- audit: append-only ~/.insforge/guard-audit.jsonl of every decision
- wired via program.hook('preAction') + parseAsync in src/index.ts
Env: INSFORGE_GUARD_BYPASS, INSFORGE_GUARD_NO_LLM, INSFORGE_GUARD_OPEN
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The hard rules (in the CLI) remain the authoritative stop/allow verdict — an agent can never downgrade a DROP. But the human-readable summary and implications now come from the calling agent itself via `--reason` (the agent is already an LLM with the most context about its intent). This removes the CLI-side LLM classifier/enrichment entirely: no API keys to configure, works for any agent (Cursor, custom, Claude Code), and the agent can explain but cannot change the verdict. The approval page shows the rule facts and the agent's explanation in clearly separated sections, and flags when the agent provided no rationale. - remove llm-classifier.ts and claude -p enrichment - brief.ts: combine rule facts + agent --reason (synchronous, no LLM) - approval page: "Verified by InsForge" vs "From the agent" sections - add global --reason option + INSFORGE_GUARD_SUMMARY env Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…brief The guard now collects a structured agent brief (--reason / --impact / --recommendation) and renders it on the approval page under a 'From the agent' group, separate from the authoritative 'Verified by InsForge' hard-rule facts. When a destructive op has no brief and the caller is non-interactive (agent/CI), the guard prints a copy-paste-ready instruction to re-run WITH a brief and exits 2 (needs_brief) instead of popping the page — pushing the local LLM to reason about implications first. Humans at a TTY (or INSFORGE_GUARD_REQUIRE_BRIEF=0) go straight to the page. The hard-rule STOP verdict is unchanged and cannot be downgraded by the agent. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ae1df05 to
8d3d600
Compare
Greptile SummaryThis PR introduces a human-in-the-loop safety guard built directly into the
Confidence Score: 4/5Safe to merge with one fix: the inspect.ts live-introspection path can present incomplete blast-radius facts for semicolon-separated multi-DROP SQL while labelling the result as fully measured. All previously flagged issues are correctly fixed. One remaining defect in inspect.ts: parseTarget only catches comma-syntax multi-table DROP/TRUNCATE, not semicolon-separated multi-statement sequences, causing the approval page to show partial live facts with a misleading measured label. The operation is still blocked at the correct critical severity, but the blast radius shown to the human approver only covers the first table. src/lib/guard/inspect.ts — specifically the parseTarget multi-table detection logic Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[CLI invoked] --> B{guardEnabled?}
B -- No --> Z[Run command]
B -- Yes --> C[assess ctx via risk-registry]
C --> D{severity == safe AND no agent flag?}
D -- Yes --> Z
D -- No --> E{INSFORGE_GUARD_BYPASS=1?}
E -- Yes --> F[audit: bypassed / Run command]
E -- No --> G{hasReason OR TTY caller?}
G -- No reason + non-TTY --> H[renderNudge to stderr / audit: needs_brief / exit 2]
G -- Has reason OR TTY --> I{path == db query?}
I -- Yes --> J[inspectSqlTarget fail-open 5s timeout]
I -- No --> K[buildBrief generic]
J --> K
K --> L[requestApproval localhost page + browser open / 120s fail-closed timeout]
L --> M{Decision}
M -- approved --> N[audit: approved / Run command]
M -- denied or timeout --> O[audit: denied/timeout / exit 1]
Reviews (6): Last reviewed commit: "fix(cli): address PR review — guard bypa..." | Re-trigger Greptile |
The approval page's authoritative facts were generic boilerplate (same text for
any DROP). Add read-only live introspection (src/lib/guard/inspect.ts) that
measures the actual target against the linked project and renders concrete facts:
real row count, table size, and the real dependents that will break (incoming
foreign keys, dependent views, RLS policies).
- Measured by InsForge via the same runRawSql path db query uses, so the agent
cannot fake it. Page header reads 'measured live from your project'.
- Fail-open: 5s timeout + try/catch; any failure falls back to the generic rule
text and the stop/allow verdict never depends on it.
- Only runs for db query SQL targeting a table (DROP/TRUNCATE/DELETE/UPDATE);
a missing table is reported as such ('not found ... nothing is dropped').
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 5 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/lib/guard/inspect.ts">
<violation number="1" location="src/lib/guard/inspect.ts:49">
P2: Multi-table DROP/TRUNCATE statements are misparsed as single-table targets, which can show incorrect live blast-radius facts.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
There was a problem hiding this comment.
6 issues found across 8 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/lib/guard/approval-server.ts">
<violation number="1" location="src/lib/guard/approval-server.ts:143">
P2: Decision parsing is too permissive: substring matching can incorrectly mark malformed query strings as approved.</violation>
<violation number="2" location="src/lib/guard/approval-server.ts:165">
P1: The approval timeout is left referenced after early resolve, which can keep the CLI process alive for up to 120s after approve/deny.</violation>
</file>
<file name="src/lib/guard/inspect.ts">
<violation number="1" location="src/lib/guard/inspect.ts:70">
P2: Query errors are converted to empty results, which can display false "no dependents" facts instead of failing open to generic text.</violation>
</file>
<file name="src/lib/guard/index.ts">
<violation number="1" location="src/lib/guard/index.ts:93">
P1: Guard command reconstruction drops CLI options, so the nudge/audit command string can misrepresent and rerun a different operation.</violation>
</file>
<file name="src/lib/guard/risk-registry.ts">
<violation number="1" location="src/lib/guard/risk-registry.ts:77">
P1: DELETE/UPDATE detection is anchored to the beginning of the SQL string, so destructive statements in later statements can bypass the guard as `safe`.</violation>
<violation number="2" location="src/lib/guard/risk-registry.ts:97">
P1: `ALTER TABLE ... DROP` detection misses multiline SQL, which can misclassify destructive DDL as safe.</violation>
</file>
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
| risk: 'Almost always unintended. Add a WHERE clause to scope the change.', | ||
| }; | ||
| } | ||
| if (/\bALTER\s+TABLE\b.*\bDROP\b/i.test(s)) { |
There was a problem hiding this comment.
P1: ALTER TABLE ... DROP detection misses multiline SQL, which can misclassify destructive DDL as safe.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/lib/guard/risk-registry.ts, line 97:
<comment>`ALTER TABLE ... DROP` detection misses multiline SQL, which can misclassify destructive DDL as safe.</comment>
<file context>
@@ -0,0 +1,195 @@
+ risk: 'Almost always unintended. Add a WHERE clause to scope the change.',
+ };
+ }
+ if (/\bALTER\s+TABLE\b.*\bDROP\b/i.test(s)) {
+ return {
+ severity: 'high',
</file context>
| const { rows } = await runRawSql(sql); | ||
| return rows ?? []; | ||
| } catch { | ||
| return []; |
There was a problem hiding this comment.
P2: Query errors are converted to empty results, which can display false "no dependents" facts instead of failing open to generic text.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/lib/guard/inspect.ts, line 70:
<comment>Query errors are converted to empty results, which can display false "no dependents" facts instead of failing open to generic text.</comment>
<file context>
@@ -0,0 +1,172 @@
+ const { rows } = await runRawSql(sql);
+ return rows ?? [];
+ } catch {
+ return [];
+ }
+}
</file context>
Beyond raw schema mechanics, surface the product-side / human-observability read
of a destructive op: what it means for the people using the app. Grounded in
measured signals so the agent cannot under-state it:
- live read/write activity from pg_stat_user_tables ('actively used' vs 'dormant')
- record count + RLS-implies-per-user-data
- a clearly-hedged data-category guess (auth/account, billing/commerce, content)
Rendered as a distinct 'What this means for your users' section on the measured
(InsForge) side of the page. Fail-open like the rest of inspect.ts.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 6
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/guard/approval-server.ts`:
- Around line 143-165: The local approval handler created in createServer
accepts any POST /decision request and must be protected by a generated one-time
token: generate a cryptographically random nonce when starting the server (in
the scope where server, renderPage and finish are created), embed that token
into the served HTML/link returned by renderPage and into the printed link, and
require the token be present and match on incoming POST requests (e.g.,
/decision?d=approve&token=XYZ or in POST body) before calling finish; if the
token is missing or mismatched, respond with 401/403 and do not call finish.
Ensure the token is single-use or tied to the server lifetime (clear or
invalidate it after a decision) and keep validation logic in the request handler
that currently checks url.startsWith('/decision') and url.includes('d=approve').
In `@src/lib/guard/index.ts`:
- Around line 91-93: The current construction of the rerun string only quotes
args with whitespace (vars args, quoted, command) and fails to shell-escape
special characters; replace that logic by creating a shellEscape helper and
apply it to every argv element (map over args and escape each one, not just
those matching /\s/), e.g. implement a shell-escaping routine that wraps each
arg in single quotes and safely encodes embedded single quotes (replace ' with
'\'' or equivalent) or use a vetted library, then build command using the
escaped args so the printed "insforge ${path} ..." is safe to re-run in a POSIX
shell.
- Around line 94-95: The audit is currently storing the full CLI invocation in
base.command which can leak sensitive literals; change the construction of the
base object so it does not persist raw payloads: create/ call a helper (e.g.,
redactCommand or hashCommand) that takes the full reconstructed command and
returns either a redacted summary (masking string/SQL literals, emails, tokens)
or a stable hash, store that result in base.command for the audit record, and
keep the full command only in the approval UI/state (not in the audit file);
update the code paths around the base = { ts..., path, command, kind, severity }
assignment to use this helper and ensure any persisted JSONL uses the
redacted/hash value.
In `@src/lib/guard/inspect.ts`:
- Around line 117-125: The q function currently hides all introspection errors
by returning [] which falsely implies "no references"; change q to return null
on failure so the enrichment path can fall back to generic text: update q's
signature (e.g., Promise<Record<string, unknown>[] | null>), return rows ?? []
on success but return null inside the catch, and update any callers of q (and
the similar query helpers in the other block referenced at 177-205) to treat
null as "measurement failed" and abort/propagate the null so the rendering uses
the generic rule text instead of empty results.
In `@src/lib/guard/README.md`:
- Around line 35-43: The fenced diagram block in the README is missing a
language tag which triggers markdownlint MD040; update the fenced code block in
src/lib/guard/README.md by adding an explicit language tag (e.g., change ``` to
```text) for the diagram block so the block is recognized as plain text—locate
the block shown with the insforge → parse → [preAction: guardHook] → action flow
and add the language tag to the opening fence.
In `@src/lib/guard/risk-registry.ts`:
- Around line 52-128: classifySql currently only checks the raw input start,
letting wrapped statements (comments, leading WITH/CTE, multiple statements)
bypass guards; change classifySql to operate statement-level: split the input
into statements (e.g., by semicolon), for each statement strip leading block (/*
*/) and line (--) comments, then remove a leading WITH ... AS (...) CTE prefix
by consuming the balanced parenthesis expression(s) after a leading WITH token,
trim and then apply the existing regex checks (e.g., the
DROP/TRUNCATE/^\s*DELETE\b/^\s*UPDATE\b tests and hasWhere logic) against the
normalized first token of each statement; update references inside classifySql
(including hasWhere usage and the DELETE/UPDATE/ALTER regex checks) to iterate
statements and return the appropriate RiskAssessment as soon as any statement is
flagged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1d5b96e6-aaf8-4b40-b2ce-0587feeeac06
📒 Files selected for processing (8)
src/index.tssrc/lib/guard/README.mdsrc/lib/guard/approval-server.tssrc/lib/guard/audit.tssrc/lib/guard/brief.tssrc/lib/guard/index.tssrc/lib/guard/inspect.tssrc/lib/guard/risk-registry.ts
| // Quote args containing whitespace so the displayed/echoed command is paste-ready. | ||
| const quoted = args.map((a) => (/\s/.test(a) ? `"${a}"` : a)); | ||
| const command = `insforge ${path} ${quoted.join(' ')}`.trim(); |
There was a problem hiding this comment.
Make the rerun command actually shell-safe.
Quoting only args that contain whitespace is not enough for db query payloads: embedded ", $, backticks, or newlines will still be reinterpreted by the shell. The nudge says “re-run the SAME command,” but this rendering can fail or execute something different. Use real shell escaping for every argv element before printing it.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/guard/index.ts` around lines 91 - 93, The current construction of the
rerun string only quotes args with whitespace (vars args, quoted, command) and
fails to shell-escape special characters; replace that logic by creating a
shellEscape helper and apply it to every argv element (map over args and escape
each one, not just those matching /\s/), e.g. implement a shell-escaping routine
that wraps each arg in single quotes and safely encodes embedded single quotes
(replace ' with '\'' or equivalent) or use a vetted library, then build command
using the escaped args so the printed "insforge ${path} ..." is safe to re-run
in a POSIX shell.
| const base = { ts: new Date().toISOString(), path, command, kind: risk.kind, severity: risk.severity }; | ||
|
|
There was a problem hiding this comment.
Redact raw payloads before auditing.
base.command is the full reconstructed CLI invocation, and every decision persists it to ~/.insforge/guard-audit.jsonl. For db query, that stores the entire SQL text—including literals like emails, tokens, or customer data—as long-lived local telemetry. Keep the full command for the approval UI if needed, but audit a redacted summary or hash instead.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/guard/index.ts` around lines 94 - 95, The audit is currently storing
the full CLI invocation in base.command which can leak sensitive literals;
change the construction of the base object so it does not persist raw payloads:
create/ call a helper (e.g., redactCommand or hashCommand) that takes the full
reconstructed command and returns either a redacted summary (masking string/SQL
literals, emails, tokens) or a stable hash, store that result in base.command
for the audit record, and keep the full command only in the approval UI/state
(not in the audit file); update the code paths around the base = { ts..., path,
command, kind, severity } assignment to use this helper and ensure any persisted
JSONL uses the redacted/hash value.
| /** Run one introspection query; resolve [] on any error (fail-open). */ | ||
| async function q(sql: string): Promise<Record<string, unknown>[]> { | ||
| try { | ||
| const { rows } = await runRawSql(sql); | ||
| return rows ?? []; | ||
| } catch { | ||
| return []; | ||
| } | ||
| } |
There was a problem hiding this comment.
Treat introspection query failures as unknown, not zero.
q() converts any query error into [], and the later rendering path turns that into statements like “Nothing else references this table” or “No reads or writes are recorded.” That understates risk while still marking the brief as live-tailored. If any required measurement query fails, return null for the whole enrichment path and fall back to the generic rule text instead.
Suggested fix
-async function q(sql: string): Promise<Record<string, unknown>[]> {
+async function q(sql: string): Promise<Record<string, unknown>[] | null> {
try {
const { rows } = await runRawSql(sql);
return rows ?? [];
} catch {
- return [];
+ return null;
}
}
...
- const [countRows, fkRows, polRows, viewRows, statRows] = await Promise.all([
+ const [countRows, fkRows, polRows, viewRows, statRows] = await Promise.all([
q(`SELECT count(*)::bigint AS n FROM "${schema}"."${table}"`),
q(`SELECT conrelid::regclass::text AS t FROM pg_constraint
WHERE confrelid = '"${schema}"."${table}"'::regclass AND contype = 'f'`),
q(`SELECT count(*)::int AS n FROM pg_policies WHERE schemaname = ${lit} AND tablename = ${tlit}`),
q(`SELECT DISTINCT view_name FROM information_schema.view_table_usage
WHERE table_schema = ${lit} AND table_name = ${tlit}`),
q(`SELECT coalesce(seq_scan,0)+coalesce(idx_scan,0) AS reads,
coalesce(n_tup_ins,0)+coalesce(n_tup_upd,0)+coalesce(n_tup_del,0) AS writes
FROM pg_stat_user_tables WHERE schemaname = ${lit} AND relname = ${tlit}`),
]);
+ if ([countRows, fkRows, polRows, viewRows, statRows].some((rows) => rows === null)) {
+ return null;
+ }Also applies to: 177-205
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/guard/inspect.ts` around lines 117 - 125, The q function currently
hides all introspection errors by returning [] which falsely implies "no
references"; change q to return null on failure so the enrichment path can fall
back to generic text: update q's signature (e.g., Promise<Record<string,
unknown>[] | null>), return rows ?? [] on success but return null inside the
catch, and update any callers of q (and the similar query helpers in the other
block referenced at 177-205) to treat null as "measurement failed" and
abort/propagate the null so the rendering uses the generic rule text instead of
empty results.
| ``` | ||
| insforge --reason "<why>" <cmd> → parse → [preAction: guardHook] → action | ||
| │ | ||
| assess() classifies the real operation (hard rules) | ||
| │ | ||
| safe? → run. dangerous? → approval page (rule facts + agent's --reason) → BLOCK | ||
| │ | ||
| approve → run · deny / timeout → exit 1 | ||
| ``` |
There was a problem hiding this comment.
Add a language tag to the fenced flow block to satisfy markdownlint.
Line 35 opens a fenced block without a language, which triggers MD040 and can fail lint-gated docs checks.
Proposed fix
-```
+```text
insforge --reason "<why>" <cmd> → parse → [preAction: guardHook] → action
│
assess() classifies the real operation (hard rules)
│
safe? → run. dangerous? → approval page (rule facts + agent's --reason) → BLOCK
│
approve → run · deny / timeout → exit 1</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 35-35: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/guard/README.md` around lines 35 - 43, The fenced diagram block in
the README is missing a language tag which triggers markdownlint MD040; update
the fenced code block in src/lib/guard/README.md by adding an explicit language
tag (e.g., change ``` to ```text) for the diagram block so the block is
recognized as plain text—locate the block shown with the insforge → parse →
[preAction: guardHook] → action flow and add the language tag to the opening
fence.
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (5)
src/lib/guard/inspect.ts (1)
48-60: ⚡ Quick winPre-compile regex patterns for efficiency and to silence linter warnings.
The patterns are constructed from the constant
QUALIFIEDon every call toparseTarget. SinceQUALIFIEDis fixed, these regex objects can be pre-compiled at module level.♻️ Proposed refactor
const IDENT = '"?([A-Za-z_][A-Za-z0-9_]*)"?'; const QUALIFIED = `(?:${IDENT}\\.)?${IDENT}`; + +const DROP_TABLE_RE = new RegExp(`^DROP\\s+TABLE\\s+(?:IF\\s+EXISTS\\s+)?${QUALIFIED}`, 'i'); +const TRUNCATE_RE = new RegExp(`^TRUNCATE\\s+(?:TABLE\\s+)?${QUALIFIED}`, 'i'); +const DELETE_FROM_RE = new RegExp(`^DELETE\\s+FROM\\s+${QUALIFIED}`, 'i'); +const UPDATE_RE = new RegExp(`^UPDATE\\s+${QUALIFIED}`, 'i'); function parseTarget(sql: string): Target | null { const s = sql.trim(); const grab = (m: RegExpMatchArray | null): Omit<Target, 'op'> | null => { if (!m) return null; const schema = (m[1] ?? 'public').toLowerCase(); const table = (m[2] ?? '').toLowerCase(); return table ? { schema, table } : null; }; let m: RegExpMatchArray | null; - if ((m = s.match(new RegExp(`^DROP\\s+TABLE\\s+(?:IF\\s+EXISTS\\s+)?${QUALIFIED}`, 'i')))) { + if ((m = s.match(DROP_TABLE_RE))) { const t = grab(m); return t && { ...t, op: 'drop' }; } - if ((m = s.match(new RegExp(`^TRUNCATE\\s+(?:TABLE\\s+)?${QUALIFIED}`, 'i')))) { + if ((m = s.match(TRUNCATE_RE))) { const t = grab(m); return t && { ...t, op: 'truncate' }; } - if ((m = s.match(new RegExp(`^DELETE\\s+FROM\\s+${QUALIFIED}`, 'i')))) { + if ((m = s.match(DELETE_FROM_RE))) { const t = grab(m); return t && { ...t, op: 'delete' }; } - if ((m = s.match(new RegExp(`^UPDATE\\s+${QUALIFIED}`, 'i')))) { + if ((m = s.match(UPDATE_RE))) { const t = grab(m); return t && { ...t, op: 'update' }; } return null; }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/guard/inspect.ts` around lines 48 - 60, parseTarget currently recreates RegExp objects on every call using new RegExp(`...${QUALIFIED}`, 'i'); extract and precompile those patterns at module scope (e.g., const RE_DROP_TABLE = new RegExp(`^DROP\\s+TABLE\\s+(?:IF\\s+EXISTS\\s+)?${QUALIFIED}`, 'i'), RE_TRUNCATE = ..., RE_DELETE = ..., RE_UPDATE = ...) and replace the inline new RegExp(...) uses in inspect.ts with the corresponding RE_* constants so parseTarget/inspect uses the precompiled regexes and silences linter/efficiency warnings.src/lib/guard/risk-registry.ts (1)
52-129: 💤 Low valueSQL classification only examines the first statement.
The function comment claims to "classify the most dangerous statement" in the SQL, but the implementation only inspects the first statement (all patterns are anchored with
^). ThehasWhereflag is computed once for the entire string (line 55), so multi-statement SQL like:DELETE FROM users WHERE id = 1; UPDATE posts SET active = truewould detect
hasWhere = truefrom the DELETE and never flag the unfiltered UPDATE assql.update_all.For a CLI guard where most commands contain single statements, this is likely acceptable. However, if multi-statement SQL is a realistic scenario, consider either:
- Updating the comment to reflect "classifies the first statement"
- Or splitting on
;and classifying each statement independently, returning the highest severity🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/guard/risk-registry.ts` around lines 52 - 129, classifySql currently inspects only the whole input and computes hasWhere once, so multi-statement SQL can hide more dangerous later statements; update classifySql to split the input on semicolons into individual statements, trim each statement, compute hasWhere per statement, run the same regex checks for each statement (using the same patterns in classifySql) and pick/return the single RiskAssessment with the highest severity (severity ordering: critical > high > others) or SAFE if none; ensure you reference and return the existing SAFE constant and preserve the kind/title/whatHappens/blastRadius/risk from the chosen most-severe statement.src/lib/guard/index.ts (1)
129-130: 💤 Low valueConsider checking for non-empty SQL string.
The
typeof args[0] === 'string'check is redundant since line 84 already converts all args to strings. The real edge case is an empty string: ifargs[0]is"", the check passes andinspectSqlTargetis called with an empty query. WhileinspectSqlTargetlikely handles this gracefully (fail-open returningnull), an explicit non-empty check is more defensive.♻️ Proposed refinement
- if (path === 'db query' && typeof args[0] === 'string') { + if (path === 'db query' && args[0] && args[0].trim()) { live = await inspectSqlTarget(args[0]); }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/guard/index.ts` around lines 129 - 130, The guard currently calls inspectSqlTarget when path === 'db query' and typeof args[0] === 'string'; change this to require a non-empty SQL string (e.g., path === 'db query' && typeof args[0] === 'string' && args[0].trim().length > 0) before invoking inspectSqlTarget(args[0]) so empty or whitespace-only queries are skipped; reference the variables path, args and the function inspectSqlTarget when making the check.src/lib/guard/audit.ts (1)
25-33: Consider log rotation or size limits for the audit file.The audit log at
~/.insforge/guard-audit.jsonlgrows unbounded with every guard evaluation. For CLI tools used frequently, this could accumulate substantial data over time. Consider implementing log rotation (e.g., date-based files) or size limits with archival to prevent disk space issues.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/guard/audit.ts` around lines 25 - 33, The audit() function currently appends to AUDIT_FILE unbounded; add rotation/size-limiting logic so the file cannot grow indefinitely: before appendFileSync in audit(AuditEntry) check the current AUDIT_FILE size (statSync) and if it exceeds a configured MAX_AUDIT_BYTES rotate the file (e.g., rename AUDIT_FILE to AUDIT_FILE + '.' + timestamp or push older versions to an archive directory) or implement date-based filenames (use AUDIT_DIR + timestamped filename) and then write the new entry to a fresh file; ensure the rotation is best-effort and still wrapped in the try/catch so auditing never throws, and expose MAX_AUDIT_BYTES and rotation strategy near the AUDIT_DIR/AUDIT_FILE constants for configurability.src/lib/guard/approval-server.ts (1)
142-147: ⚡ Quick winConsider parsing the query string properly for robustness.
The current implementation uses
url.includes('d=approve')which would match unintended URLs like/decision?d=approvedor/decision?xd=approve. While this isn't a practical security issue (localhost-only, user-controlled), proper query parsing would be more robust.♻️ Proposed fix using URLSearchParams
if (req.method === 'POST' && url.startsWith('/decision')) { - const approved = url.includes('d=approve'); + const searchParams = new URL(url, 'http://localhost').searchParams; + const approved = searchParams.get('d') === 'approve'; res.writeHead(200, { 'content-type': 'text/plain' }).end('ok');🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/guard/approval-server.ts` around lines 142 - 147, Replace the brittle substring check url.includes('d=approve') with proper query parsing: construct a URL object from req.url (e.g., new URL(req.url, 'http://localhost')) and use its searchParams to check whether the d parameter equals 'approve'; then call finish(approved ? 'approved' : 'denied', server) as before. Locate the POST /decision branch around the req.method check and variable url and update the approved computation to use URL.searchParams.get('d') === 'approve' (or equivalent) so only an exact d=approve query matches.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/lib/guard/index.ts`:
- Around line 92-93: The current quoting logic in the quoted variable (args.map)
only wraps args with whitespace but fails to escape existing quotes, producing
malformed paste-ready commands; update the args mapping to first escape
backslashes and double quotes inside each argument (e.g., replace occurrences of
" and \ with escaped versions) and then wrap arguments that contain whitespace
in double quotes before building the command string in command, so inputs like
say "hello" become safely escaped and shell-safe.
In `@src/lib/guard/README.md`:
- Around line 35-43: The fenced diagram block in src/lib/guard/README.md is
missing a language identifier causing markdownlint MD040; update the
triple-backtick fence that contains the insforge diagram (the block starting
with "insforge --reason") to use a neutral language identifier such as text
(i.e., replace ``` with ```text) so the lint rule is satisfied.
---
Nitpick comments:
In `@src/lib/guard/approval-server.ts`:
- Around line 142-147: Replace the brittle substring check
url.includes('d=approve') with proper query parsing: construct a URL object from
req.url (e.g., new URL(req.url, 'http://localhost')) and use its searchParams to
check whether the d parameter equals 'approve'; then call finish(approved ?
'approved' : 'denied', server) as before. Locate the POST /decision branch
around the req.method check and variable url and update the approved computation
to use URL.searchParams.get('d') === 'approve' (or equivalent) so only an exact
d=approve query matches.
In `@src/lib/guard/audit.ts`:
- Around line 25-33: The audit() function currently appends to AUDIT_FILE
unbounded; add rotation/size-limiting logic so the file cannot grow
indefinitely: before appendFileSync in audit(AuditEntry) check the current
AUDIT_FILE size (statSync) and if it exceeds a configured MAX_AUDIT_BYTES rotate
the file (e.g., rename AUDIT_FILE to AUDIT_FILE + '.' + timestamp or push older
versions to an archive directory) or implement date-based filenames (use
AUDIT_DIR + timestamped filename) and then write the new entry to a fresh file;
ensure the rotation is best-effort and still wrapped in the try/catch so
auditing never throws, and expose MAX_AUDIT_BYTES and rotation strategy near the
AUDIT_DIR/AUDIT_FILE constants for configurability.
In `@src/lib/guard/index.ts`:
- Around line 129-130: The guard currently calls inspectSqlTarget when path ===
'db query' and typeof args[0] === 'string'; change this to require a non-empty
SQL string (e.g., path === 'db query' && typeof args[0] === 'string' &&
args[0].trim().length > 0) before invoking inspectSqlTarget(args[0]) so empty or
whitespace-only queries are skipped; reference the variables path, args and the
function inspectSqlTarget when making the check.
In `@src/lib/guard/inspect.ts`:
- Around line 48-60: parseTarget currently recreates RegExp objects on every
call using new RegExp(`...${QUALIFIED}`, 'i'); extract and precompile those
patterns at module scope (e.g., const RE_DROP_TABLE = new
RegExp(`^DROP\\s+TABLE\\s+(?:IF\\s+EXISTS\\s+)?${QUALIFIED}`, 'i'), RE_TRUNCATE
= ..., RE_DELETE = ..., RE_UPDATE = ...) and replace the inline new RegExp(...)
uses in inspect.ts with the corresponding RE_* constants so parseTarget/inspect
uses the precompiled regexes and silences linter/efficiency warnings.
In `@src/lib/guard/risk-registry.ts`:
- Around line 52-129: classifySql currently inspects only the whole input and
computes hasWhere once, so multi-statement SQL can hide more dangerous later
statements; update classifySql to split the input on semicolons into individual
statements, trim each statement, compute hasWhere per statement, run the same
regex checks for each statement (using the same patterns in classifySql) and
pick/return the single RiskAssessment with the highest severity (severity
ordering: critical > high > others) or SAFE if none; ensure you reference and
return the existing SAFE constant and preserve the
kind/title/whatHappens/blastRadius/risk from the chosen most-severe statement.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 03dc5f4d-7f49-4e3e-b66b-f48387be5824
📒 Files selected for processing (8)
src/index.tssrc/lib/guard/README.mdsrc/lib/guard/approval-server.tssrc/lib/guard/audit.tssrc/lib/guard/brief.tssrc/lib/guard/index.tssrc/lib/guard/inspect.tssrc/lib/guard/risk-registry.ts
| ``` | ||
| insforge --reason "<why>" <cmd> → parse → [preAction: guardHook] → action | ||
| │ | ||
| assess() classifies the real operation (hard rules) | ||
| │ | ||
| safe? → run. dangerous? → approval page (rule facts + agent's --reason) → BLOCK | ||
| │ | ||
| approve → run · deny / timeout → exit 1 | ||
| ``` |
There was a problem hiding this comment.
Add a language identifier to the fenced code block.
This fence triggers markdownlint MD040 (missing fenced-code language). Use a neutral language like text to satisfy linting.
Suggested fix
-```
+```text
insforge --reason "<why>" <cmd> → parse → [preAction: guardHook] → action
│
assess() classifies the real operation (hard rules)
│
safe? → run. dangerous? → approval page (rule facts + agent's --reason) → BLOCK
│
approve → run · deny / timeout → exit 1</details>
<!-- suggestion_start -->
<details>
<summary>📝 Committable suggestion</summary>
> ‼️ **IMPORTANT**
> Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
```suggestion
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 35-35: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/lib/guard/README.md` around lines 35 - 43, The fenced diagram block in
src/lib/guard/README.md is missing a language identifier causing markdownlint
MD040; update the triple-backtick fence that contains the insforge diagram (the
block starting with "insforge --reason") to use a neutral language identifier
such as text (i.e., replace ``` with ```text) so the lint rule is satisfied.
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/lib/guard/approval-server.ts">
<violation number="1" location="src/lib/guard/approval-server.ts:143">
P2: Decision parsing is too permissive: substring matching can incorrectly mark malformed query strings as approved.</violation>
<violation number="2" location="src/lib/guard/approval-server.ts:165">
P1: The approval timeout is left referenced after early resolve, which can keep the CLI process alive for up to 120s after approve/deny.</violation>
</file>
<file name="src/lib/guard/inspect.ts">
<violation number="1" location="src/lib/guard/inspect.ts:49">
P2: Multi-table DROP/TRUNCATE statements are misparsed as single-table targets, which can show incorrect live blast-radius facts.</violation>
<violation number="2" location="src/lib/guard/inspect.ts:70">
P2: Query errors are converted to empty results, which can display false "no dependents" facts instead of failing open to generic text.</violation>
<violation number="3" location="src/lib/guard/inspect.ts:195">
P2: Introspection query failures are silently treated as empty results, which can lead `buildUserImpact` to produce statements that understate risk. Since `q()` returns `[]` on error, a failed `pg_stat_user_tables` query results in `null` reads/writes (hiding the 'active use' signal), and a failed `pg_policies` query results in `policies = 0` (hiding the RLS hint). Consider returning `null` from `q()` on failure and short-circuiting the enrichment path when any critical measurement query fails, rather than presenting incomplete data as if it were a measured 'safe' state.</violation>
</file>
<file name="src/lib/guard/risk-registry.ts">
<violation number="1" location="src/lib/guard/risk-registry.ts:77">
P1: DELETE/UPDATE detection is anchored to the beginning of the SQL string, so destructive statements in later statements can bypass the guard as `safe`.</violation>
<violation number="2" location="src/lib/guard/risk-registry.ts:97">
P1: `ALTER TABLE ... DROP` detection misses multiline SQL, which can misclassify destructive DDL as safe.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| const views = viewRows.map((r) => String(r.view_name)).filter(Boolean); | ||
| const reads = statRows.length ? num(statRows[0].reads) : null; | ||
| const writes = statRows.length ? num(statRows[0].writes) : null; | ||
| const userImpact = buildUserImpact(schema, table, rows, policies, reads, writes) || null; |
There was a problem hiding this comment.
P2: Introspection query failures are silently treated as empty results, which can lead buildUserImpact to produce statements that understate risk. Since q() returns [] on error, a failed pg_stat_user_tables query results in null reads/writes (hiding the 'active use' signal), and a failed pg_policies query results in policies = 0 (hiding the RLS hint). Consider returning null from q() on failure and short-circuiting the enrichment path when any critical measurement query fails, rather than presenting incomplete data as if it were a measured 'safe' state.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/lib/guard/inspect.ts, line 195:
<comment>Introspection query failures are silently treated as empty results, which can lead `buildUserImpact` to produce statements that understate risk. Since `q()` returns `[]` on error, a failed `pg_stat_user_tables` query results in `null` reads/writes (hiding the 'active use' signal), and a failed `pg_policies` query results in `policies = 0` (hiding the RLS hint). Consider returning `null` from `q()` on failure and short-circuiting the enrichment path when any critical measurement query fails, rather than presenting incomplete data as if it were a measured 'safe' state.</comment>
<file context>
@@ -112,26 +165,34 @@ export async function inspectSqlTarget(sql: string): Promise<LiveFacts | null> {
const views = viewRows.map((r) => String(r.view_name)).filter(Boolean);
+ const reads = statRows.length ? num(statRows[0].reads) : null;
+ const writes = statRows.length ? num(statRows[0].writes) : null;
+ const userImpact = buildUserImpact(schema, table, rows, policies, reads, writes) || null;
const rowsTxt = rows === null ? 'an unknown number of' : rows.toLocaleString();
</file context>
Reskin the localhost approval page to InsForge's theme using the site's own tokens: near-black #000 background with an emerald radial glow, #141414 card, emerald #6ee7b7 accent (the 'Verified by InsForge' header + user-impact box + logo mark), Manrope/Inter type, and an InsForge wordmark. Severity red is kept for the danger pill and Approve button so risk stays legible. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…e' dividers Cleaner, more on-brand layout: faint grid + mint glow on pure black, hairline label/value facts (no shouty uppercase group headers), a quiet mint 'measured from your project' caption instead of the messy 'Verified by InsForge' divider, an emerald 'Impact on your users' callout, and a subtle 'Agent's reasoning' card. Deny is the white primary (safe default); Approve & run is a red ghost button. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
♻️ Duplicate comments (1)
src/lib/guard/approval-server.ts (1)
151-153:⚠️ Potential issue | 🔴 Critical | ⚡ Quick winBind the approval POST to a one-time session token.
This is still accepting any localhost
POST /decision, and Line 182 treats any URL containingd=approveas approval. Another local process—or a webpage running in the approver’s browser—can approve the destructive command without the human using this page.Suggested minimal fix
+import { randomBytes } from 'node:crypto'; import { createServer } from 'node:http'; import type { AddressInfo } from 'node:net'; import open from 'open'; import type { Brief } from './brief.js'; -function renderPage(brief: Brief): string { +function renderPage(brief: Brief, token: string): string { return `<!doctype html><html><head><meta charset="utf-8"> ... <script> function decide(d) { - fetch('/decision?d=' + d, { method: 'POST' }).then(function () { + fetch('/decision?d=' + encodeURIComponent(d) + '&token=${token}', { method: 'POST' }).then(function () { document.getElementById('card').innerHTML = '<div class="bar"></div><div class="done"><h1 class="' + (d === 'approve' ? 'ok' : '') + '">' + (d === 'approve' ? 'Approved — running now.' : 'Denied — nothing ran.') + '</h1><div class="sub">You can close this window.</div></div>'; }); } </script> </body></html>`; } export function requestApproval(brief: Brief): Promise<ApprovalResult> { return new Promise((resolve) => { let settled = false; + const token = randomBytes(32).toString('hex'); const finish = (r: ApprovalResult, server?: ReturnType<typeof createServer>) => { if (settled) return; settled = true; try { server?.close(); } catch { /* ignore */ } resolve(r); @@ const server = createServer((req, res) => { const url = req.url ?? '/'; if (req.method === 'POST' && url.startsWith('/decision')) { - const approved = url.includes('d=approve'); + const parsed = new URL(url, 'http://127.0.0.1'); + if (parsed.searchParams.get('token') !== token) { + res.writeHead(403, { 'content-type': 'text/plain' }).end('forbidden'); + return; + } + const approved = parsed.searchParams.get('d') === 'approve'; res.writeHead(200, { 'content-type': 'text/plain' }).end('ok'); finish(approved ? 'approved' : 'denied', server); return; } - res.writeHead(200, { 'content-type': 'text/html; charset=utf-8' }).end(renderPage(brief)); + res.writeHead(200, { 'content-type': 'text/html; charset=utf-8' }).end(renderPage(brief, token)); });Also applies to: 179-184
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/guard/approval-server.ts` around lines 151 - 153, The client-side decide(d) fetch call is vulnerable because POST /decision accepts any request and the server treats any URL containing d=approve as approval; fix by tying approval to a one-time session token: have the server generate a cryptographically random token when rendering the approval page, embed that token in the page (next to the decide(d) call) and include it in the POST (e.g., in JSON body or Authorization header) from decide(d), then update the server-side POST /decision handler to strictly validate that d === 'approve' (exact match, not substring) and verify the token against a server-side store and immediately invalidate it after first use so it cannot be replayed; ensure the token is single-use and has a short TTL and return 403 for invalid/missing tokens.
🧹 Nitpick comments (1)
src/lib/guard/approval-server.ts (1)
144-146: ⚡ Quick winPut Deny first in the tab order.
The UI makes Deny the safe primary action, but keyboard focus reaches Approve first because it comes first in the markup. For this guard, the safer choice should be the first tabbable control.
Suggested change
<div class="row"> - <button class="approve" onclick="decide('approve')">Approve & run</button> <button class="deny" onclick="decide('deny')">Deny</button> + <button class="approve" onclick="decide('approve')">Approve & run</button> </div>🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@src/lib/guard/approval-server.ts` around lines 144 - 146, Swap the tabbable order so the safer "Deny" button receives focus first: reorder the two button elements so the button with class "deny" and onclick="decide('deny')" appears before the button with class "approve" and onclick="decide('approve')", or alternatively set explicit tabindex values with the "deny" button having a lower tabindex than the "approve" button; update the markup around the buttons in the approval UI (the elements referencing decide('deny') and decide('approve')) accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Duplicate comments:
In `@src/lib/guard/approval-server.ts`:
- Around line 151-153: The client-side decide(d) fetch call is vulnerable
because POST /decision accepts any request and the server treats any URL
containing d=approve as approval; fix by tying approval to a one-time session
token: have the server generate a cryptographically random token when rendering
the approval page, embed that token in the page (next to the decide(d) call) and
include it in the POST (e.g., in JSON body or Authorization header) from
decide(d), then update the server-side POST /decision handler to strictly
validate that d === 'approve' (exact match, not substring) and verify the token
against a server-side store and immediately invalidate it after first use so it
cannot be replayed; ensure the token is single-use and has a short TTL and
return 403 for invalid/missing tokens.
---
Nitpick comments:
In `@src/lib/guard/approval-server.ts`:
- Around line 144-146: Swap the tabbable order so the safer "Deny" button
receives focus first: reorder the two button elements so the button with class
"deny" and onclick="decide('deny')" appears before the button with class
"approve" and onclick="decide('approve')", or alternatively set explicit
tabindex values with the "deny" button having a lower tabindex than the
"approve" button; update the markup around the buttons in the approval UI (the
elements referencing decide('deny') and decide('approve')) accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 67703677-470b-4d50-b856-1fa743e22c93
📒 Files selected for processing (1)
src/lib/guard/approval-server.ts
…on colors - Echo the canonical public invocation (npx @insforge/cli ...) in the approval page and the nudge, matching how agents and the CLI's own messages call it. - Approval buttons follow the standard destructive pattern: Deny = neutral (cancel, left), Approve & run = solid severity-red (confirm, right). Replaces the confusing white-Deny / red-ghost-Approve treatment. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add INSFORGE_GUARD_TIMEOUT_MS (default 120000) to set how long the approval page waits before the fail-closed deny — useful when a human isn't watching the terminal. Documented in the guard README. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
risk-registry.test.ts — SQL classification (DROP/TRUNCATE/unfiltered DELETE-UPDATE/ ALTER..DROP/RLS vs safe reads), command-path classification (storage/compute/ functions/secrets delete + destructive-verb catch-all), and a trust-boundary test that caller-supplied opts can't downgrade the verdict. brief.test.ts — rule facts pass through, live facts override generic text when present, agent fields surface but cannot change severity/title/whatHappens, blank agent fields are trimmed to null, and guidance scales with severity. 25 tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…cases Layer the agent's dynamic judgment on top of the static rules: --flag-destructive [reason] raises a 'safe' verdict to 'high' so the agent can stop itself on an edge case the rules don't recognize (e.g. a 'safe'-looking UPDATE that wipes prod config). Strictly escalate-only via applyAgentFlag(): effective severity is max(hard-rule, agent-flag). A flag can ADD a gate but can NEVER lower a verdict the rules produced, so a buggy/injected agent can't use it to bypass. The flag reason renders on the approval page; it also satisfies the nudge. Tests: applyAgentFlag escalates safe->high, leaves dangerous verdicts untouched, and no flag value ever yields 'safe' from a dangerous op; brief carries the flag. 31 guard tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 8 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/lib/guard/README.md">
<violation number="1" location="src/lib/guard/README.md:94">
P2: Documentation example shows `--flag-destructive` with an unfiltered UPDATE SQL (`UPDATE tenant_config SET plan = 'free'` — no WHERE clause), but the hard rules already classify unfiltered UPDATEs as `critical`. The example therefore fails to demonstrate the escalate-only mechanism: the flag is redundant here, making the documentation misleading about when `--flag-destructive` actually matters.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
|
|
||
| ## Agent escalation (escalate-only) | ||
|
|
||
| The static rules can't know every edge case. The calling agent — which has app |
There was a problem hiding this comment.
P2: Documentation example shows --flag-destructive with an unfiltered UPDATE SQL (UPDATE tenant_config SET plan = 'free' — no WHERE clause), but the hard rules already classify unfiltered UPDATEs as critical. The example therefore fails to demonstrate the escalate-only mechanism: the flag is redundant here, making the documentation misleading about when --flag-destructive actually matters.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/lib/guard/README.md, line 94:
<comment>Documentation example shows `--flag-destructive` with an unfiltered UPDATE SQL (`UPDATE tenant_config SET plan = 'free'` — no WHERE clause), but the hard rules already classify unfiltered UPDATEs as `critical`. The example therefore fails to demonstrate the escalate-only mechanism: the flag is redundant here, making the documentation misleading about when `--flag-destructive` actually matters.</comment>
<file context>
@@ -89,6 +89,21 @@ insforge \
+## Agent escalation (escalate-only)
+
+The static rules can't know every edge case. The calling agent — which has app
+context the rules don't — can flag an operation the rules consider safe:
+
</file context>
…e rollout Add a master on/off switch (src/lib/guard/enabled.ts) checked first in the hook, so merging/publishing the guard is a no-op in production until opted in. INSFORGE_GUARD is the source of truth (1/true/on vs 0/false/off); default OFF during rollout via GUARD_DEFAULT_ENABLED, flip to true (or wire to a remote flag) for GA. 4 tests for the switch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add '--guard [state]' to the link command so a project can opt into the HITL guard persistently (stored in .insforge/project.json as guard: true/false), with '--guard off' to disable. The hook reads it with precedence INSFORGE_GUARD env > persisted setting > default(off), so the env var stays the override / kill switch. Tests cover the precedence. Verified e2e: link --guard gates with no env var; --guard off no-ops; env overrides the persisted value. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 6 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/lib/guard/inspect.ts">
<violation number="1" location="src/lib/guard/inspect.ts:70">
P2: Query errors are converted to empty results, which can display false "no dependents" facts instead of failing open to generic text.</violation>
<violation number="2" location="src/lib/guard/inspect.ts:195">
P2: Introspection query failures are silently treated as empty results, which can lead `buildUserImpact` to produce statements that understate risk. Since `q()` returns `[]` on error, a failed `pg_stat_user_tables` query results in `null` reads/writes (hiding the 'active use' signal), and a failed `pg_policies` query results in `policies = 0` (hiding the RLS hint). Consider returning `null` from `q()` on failure and short-circuiting the enrichment path when any critical measurement query fails, rather than presenting incomplete data as if it were a measured 'safe' state.</violation>
</file>
<file name="src/lib/guard/risk-registry.ts">
<violation number="1" location="src/lib/guard/risk-registry.ts:97">
P1: `ALTER TABLE ... DROP` detection misses multiline SQL, which can misclassify destructive DDL as safe.</violation>
</file>
<file name="src/lib/guard/README.md">
<violation number="1" location="src/lib/guard/README.md:94">
P2: Documentation example shows `--flag-destructive` with an unfiltered UPDATE SQL (`UPDATE tenant_config SET plan = 'free'` — no WHERE clause), but the hard rules already classify unfiltered UPDATEs as `critical`. The example therefore fails to demonstrate the escalate-only mechanism: the flag is redundant here, making the documentation misleading about when `--flag-destructive` actually matters.</violation>
</file>
<file name="src/lib/guard/index.ts">
<violation number="1" location="src/lib/guard/index.ts:88">
P1: Swallowing project-config read/parse errors silently disables the safety guard (fail-open) when env override is unset.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
| try { storedGuard = getProjectConfig()?.guard ?? null; } catch { /* fail to default */ } | ||
| if (!guardEnabled(process.env, storedGuard)) return; |
There was a problem hiding this comment.
P1: Swallowing project-config read/parse errors silently disables the safety guard (fail-open) when env override is unset.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/lib/guard/index.ts, line 88:
<comment>Swallowing project-config read/parse errors silently disables the safety guard (fail-open) when env override is unset.</comment>
<file context>
@@ -82,7 +83,10 @@ function renderNudge(command: string, risk: RiskAssessment): string {
- if (!guardEnabled()) return;
+ // Precedence: INSFORGE_GUARD env > persisted `link --guard` setting > default.
+ let storedGuard: boolean | null = null;
+ try { storedGuard = getProjectConfig()?.guard ?? null; } catch { /* fail to default */ }
+ if (!guardEnabled(process.env, storedGuard)) return;
</file context>
| try { storedGuard = getProjectConfig()?.guard ?? null; } catch { /* fail to default */ } | |
| if (!guardEnabled(process.env, storedGuard)) return; | |
| try { | |
| storedGuard = getProjectConfig()?.guard ?? null; | |
| } catch { | |
| storedGuard = true; | |
| process.stderr.write(' ⚠️ Failed to read .insforge/project.json; enabling guard fail-closed.\n'); | |
| } | |
| if (!guardEnabled(process.env, storedGuard)) return; |
- [P1] Multi-statement SQL bypass: classify each ;-separated statement and take the max severity, so 'SELECT 1; DROP TABLE users' no longer reads as safe and a sibling WHERE can't mask an unfiltered DELETE/UPDATE. (greptile) - [P1] CSRF on the approval endpoint: per-request single-use token embedded in the page and required on POST /decision; cross-origin/blind POSTs get 403. Also parse the decision with URL/searchParams instead of substring matching. (greptile + cubic) - [P1] Clear the approval timeout on decide so the process doesn't linger up to 120s after approve/deny. (cubic) - [P1] Reconstruct the echoed/nudge/audit command from real argv so CLI options (e.g. --unrestricted) aren't dropped. (cubic) - [P2] Nudge interactivity keys on process.stdin.isTTY, not stdout (piped stdout no longer trips it). (greptile) - [P2] Multi-table DROP/TRUNCATE bails to generic text instead of showing facts for one of several tables. (cubic) Tests for the multi-statement bypass + WHERE-masking regressions. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
You're iterating quickly on this pull request. To help protect your rate limits, cubic has paused automatic reviews on new pushes for now—when you're ready for another review, comment |
|
Thanks for the thorough review — addressed in Fixed
Noted, not blocking
Suite is at 460 passing. This is still a POC, gated behind |
What
A human-in-the-loop safety guard built into the
insforgeCLI itself (a CommanderpreActionstage), so it protects every caller automatically — Claude Code, Cursor, custom agents, scripts, CI, and humans. When a destructive operation is about to run, the CLI stops, opens a localhost approval page explaining the implications, and blocks until a human clicks Approve or Deny.Two trust levels
DROP/TRUNCATE/unfilteredDELETE-UPDATE/ALTER…DROP/RLS changes, pluscompute delete,storage delete-bucket,functions delete,secrets delete, and a destructive-verb catch-all). An agent can never downgrade this verdict.--reason(intent),--impact(implications),--recommendation. The CLI makes no LLM call of its own — no keys to configure, works for any agent. The agent explains; it cannot change the verdict.Encouraging the agent to articulate intent (the nudge)
When a destructive op has no brief and the caller is non-interactive (agent/CI, detected via no TTY), the guard does not pop the page — it prints a copy-paste-ready instruction to re-run with a brief and exits
2(needs_brief). The agent reasons about the implications, re-runs, and then the human sees a readable page. A human at a TTY goes straight to the page.Guarantees
SELECT,insert,list, etc. pass straight through.~/.insforge/guard-audit.jsonl.Files
src/lib/guard/{index,risk-registry,brief,approval-server,audit}.ts+README.md, wired insrc/index.ts(global--reason/--impact/--recommendationoptions +program.hook('preAction', …)+parseAsync).Env knobs
INSFORGE_GUARD_REQUIRE_BRIEF(1/0),INSFORGE_GUARD_BYPASS=1(audited),INSFORGE_GUARD_OPEN=0(headless, print link only), plus*_SUMMARY/*_IMPACT/*_RECOMMENDATIONenv fallbacks.Testing
DROP TABLEactually dropped the table; denying left the table + row intact (exit 1).Co-Authored-By: Claude Opus 4.8 noreply@anthropic.com
Summary by cubic
Adds a human-in-the-loop safety guard to the
insforgeCLI that pauses destructive operations for local approval, with livedb queryfacts and agent-supplied intent. It’s off by default; enable per project withinsforge link --guardor globally withINSFORGE_GUARD=1.Bug Fixes
;-separated statement and taking the highest severity.stdin.isTTY, and fall back to generic text for multi-table DROP/TRUNCATE.Migration
insforge link --guard(persisted in.insforge/project.json) orINSFORGE_GUARD=1; for headless use setINSFORGE_GUARD_OPEN=0and adjust the window withINSFORGE_GUARD_TIMEOUT_MS.--reason(and optionally--impact,--recommendation), or--flag-destructive [reason]; handle exit codes: 1 = denied/timeout, 2 = needs_brief, 0 = approved/safe.Written for commit ca699ae. Summary will update on new commits.
Note
Add human-in-the-loop guard to block dangerous CLI operations pending approval
guardHookpreAction hook in src/index.ts that intercepts all CLI commands, assesses risk, and blocks execution pending human approval via a localhost approval page when the guard is enabled.db queryoperations, blocks until approved/denied, and times out fail-closed.~/.insforge/guard-audit.jsonlvia src/lib/guard/audit.ts. Enablement is controlled byINSFORGE_GUARDenv var or a per-projectguardfield set viainsforge link --guard.GUARD_DEFAULT_ENABLED=false); when enabled, missing agent briefs cause non-interactive callers to exit with code 2, and denied/timeout decisions exit with code 1.Macroscope summarized ca699ae.
Summary by CodeRabbit
New Features
Documentation
Tests
Chores