Skip to content

Make the TS regex lanes comment/string aware (#28)#29

Merged
yavorpanayotov merged 1 commit into
mainfrom
fix/regex-lane-comment-string-awareness
Jun 11, 2026
Merged

Make the TS regex lanes comment/string aware (#28)#29
yavorpanayotov merged 1 commit into
mainfrom
fix/regex-lane-comment-string-awareness

Conversation

@yavorpanayotov

Copy link
Copy Markdown
Collaborator

Closes #28. Follow-up to #25 — the complementary "Rust silent, TypeScript false-positive" direction of the malformed-input divergence.

Problem

The analyzer's checks (extensions/allium/src/language-tools/analyzer.ts) run regexes over raw source text with no lexer context, so they matched keyword-shaped text inside comments and string literals that the Rust front end reads as content, not code. Example: a deferred-shaped token inside a -- comment produced a spurious allium.deferred.missingLocationHint that allium check never emits.

Fix

analyzeAllium now computes a masked view of the source via maskCommentsAndStrings, which blanks comment/string/backtick content to spaces while preserving length, byte offsets, and line breaks — so every finding offset still maps back to the original source.

The masker mirrors the Rust lexer (crates/allium-parser/src/lexer.rs):

Block bodies are re-sliced from the masked text, so every body-based lane inherits the awareness, and the detection lanes receive the masked text in place of raw text.

Raw text is deliberately kept for the two consumers that read comment/string content on purpose:

  • findDeferredLocationHints — now detects the deferred keyword on the masked text (so comment/string-embedded tokens vanish) but reads the -- see: / quoted-path / URL hint from the raw text.
  • applySuppressions — the -- allium-ignore directive.

Delimiters (", `) and the -- of a comment are preserved, so lanes that only need to detect a string/comment's presence (e.g. the type-mismatch operand lanes) still see one.

Tests

New unit tests for maskCommentsAndStrings (comment/string/backtick blanking, \ escapes, lone-CR-in-comment) and analyzer integration tests for the #28 reproducers:

  • no warning on deferred-shaped text inside a comment, inside a string, or after a lone \r in a comment;
  • still warns on a real deferred without a hint;
  • still accepts a real deferred with a -- see: hint.

All workspaces green (npm run test: 306 extension + 8 LSP), npm run lint clean.

Scope note

This covers the diagnostic lanes in analyzeAllium (the lanes #28 is about). The separate process-level lanes behind allium analyse (findProcessCompletenessIssues et al.) are unchanged; if they show the same class of false positive they can reuse maskCommentsAndStrings in a follow-up.

🤖 Generated with Claude Code

The analyzer's checks run regexes over raw source text with no lexer
context, so they matched keyword-shaped text inside comments and string
literals that the Rust front end reads as content, not code — e.g. a
`deferred`-shaped token inside a `--` comment produced a spurious
`allium.deferred.missingLocationHint` that `allium check` never emits.
This is the "Rust silent, TypeScript false-positive" direction left open
by #25.

`analyzeAllium` now computes a masked view of the source via
`maskCommentsAndStrings`, which blanks comment/string/backtick content to
spaces while preserving length, offsets, and line breaks. The masker
mirrors the Rust lexer (comments run `--`..`\n`, a lone `\r` does not end
them; strings honour `\` escapes and end at `"`/`\n`; backticks end at
`` ` ``/`\n`/`\r`). Block bodies are re-sliced from the masked text and the
detection lanes receive it in place of raw text.

Two consumers keep raw text because they read comment/string content on
purpose: `findDeferredLocationHints` (detects the keyword on masked text,
reads the `-- see:`/quoted-path hint from raw text) and `applySuppressions`
(the `-- allium-ignore` directive). Delimiters and the `--` of a comment
are preserved, so lanes that only detect a string/comment's presence still
see one.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@yavorpanayotov yavorpanayotov merged commit bd22350 into main Jun 11, 2026
2 checks passed
yavorpanayotov added a commit that referenced this pull request Jun 11, 2026
The #28 masking fix (bd22350) blanked string content to spaces in the
text the regex lanes match on. Two lanes compare string-literal values
textually, and masking collapses distinct same-length literals into
identical text ("a" and "b" both become " "), breaking value-equality
reasoning both ways: a spurious rule.neverFires on satisfiable
requires pairs ("a" vs != "b"), and missed genuine contradictions
("aa" = "bb" no longer flagged by neverFires or impossibleWhen).

Detection stays on masked text; when a captured operand starts with a
quote, findNeverFireRuleIssues and findSurfaceImpossibleWhenIssues now
re-read the operand from the raw source at the same offsets via
rawStringOperand — the mask preserves length and offsets, and the value
group is anchored at the end of each match.

Found by post-merge adversarial review of #29.

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TypeScript regex lanes lack comment/string awareness (false positives on keyword-shaped text)

1 participant