Make the TS regex lanes comment/string aware (#28)#29
Merged
Conversation
The analyzer's checks run regexes over raw source text with no lexer context, so they matched keyword-shaped text inside comments and string literals that the Rust front end reads as content, not code — e.g. a `deferred`-shaped token inside a `--` comment produced a spurious `allium.deferred.missingLocationHint` that `allium check` never emits. This is the "Rust silent, TypeScript false-positive" direction left open by #25. `analyzeAllium` now computes a masked view of the source via `maskCommentsAndStrings`, which blanks comment/string/backtick content to spaces while preserving length, offsets, and line breaks. The masker mirrors the Rust lexer (comments run `--`..`\n`, a lone `\r` does not end them; strings honour `\` escapes and end at `"`/`\n`; backticks end at `` ` ``/`\n`/`\r`). Block bodies are re-sliced from the masked text and the detection lanes receive it in place of raw text. Two consumers keep raw text because they read comment/string content on purpose: `findDeferredLocationHints` (detects the keyword on masked text, reads the `-- see:`/quoted-path hint from raw text) and `applySuppressions` (the `-- allium-ignore` directive). Delimiters and the `--` of a comment are preserved, so lanes that only detect a string/comment's presence still see one. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
yavorpanayotov
added a commit
that referenced
this pull request
Jun 11, 2026
The #28 masking fix (bd22350) blanked string content to spaces in the text the regex lanes match on. Two lanes compare string-literal values textually, and masking collapses distinct same-length literals into identical text ("a" and "b" both become " "), breaking value-equality reasoning both ways: a spurious rule.neverFires on satisfiable requires pairs ("a" vs != "b"), and missed genuine contradictions ("aa" = "bb" no longer flagged by neverFires or impossibleWhen). Detection stays on masked text; when a captured operand starts with a quote, findNeverFireRuleIssues and findSurfaceImpossibleWhenIssues now re-read the operand from the raw source at the same offsets via rawStringOperand — the mask preserves length and offsets, and the value group is anchored at the end of each match. Found by post-merge adversarial review of #29. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #28. Follow-up to #25 — the complementary "Rust silent, TypeScript false-positive" direction of the malformed-input divergence.
Problem
The analyzer's checks (
extensions/allium/src/language-tools/analyzer.ts) run regexes over raw source text with no lexer context, so they matched keyword-shaped text inside comments and string literals that the Rust front end reads as content, not code. Example: adeferred-shaped token inside a--comment produced a spuriousallium.deferred.missingLocationHintthatallium checknever emits.Fix
analyzeAlliumnow computes a masked view of the source viamaskCommentsAndStrings, which blanks comment/string/backtick content to spaces while preserving length, byte offsets, and line breaks — so every finding offset still maps back to the original source.The masker mirrors the Rust lexer (
crates/allium-parser/src/lexer.rs):--to the next\n— a lone\rdoes not end them (the exact case from TypeScript analyzer discards WASM parse diagnostics #25);\escapes and terminate at"or\n;`,\n, or\r.Block bodies are re-sliced from the masked text, so every body-based lane inherits the awareness, and the detection lanes receive the masked text in place of raw text.
Raw text is deliberately kept for the two consumers that read comment/string content on purpose:
findDeferredLocationHints— now detects thedeferredkeyword on the masked text (so comment/string-embedded tokens vanish) but reads the-- see:/ quoted-path / URL hint from the raw text.applySuppressions— the-- allium-ignoredirective.Delimiters (
",`) and the--of a comment are preserved, so lanes that only need to detect a string/comment's presence (e.g. the type-mismatch operand lanes) still see one.Tests
New unit tests for
maskCommentsAndStrings(comment/string/backtick blanking,\escapes, lone-CR-in-comment) and analyzer integration tests for the #28 reproducers:deferred-shaped text inside a comment, inside a string, or after a lone\rin a comment;deferredwithout a hint;deferredwith a-- see:hint.All workspaces green (
npm run test: 306 extension + 8 LSP),npm run lintclean.Scope note
This covers the diagnostic lanes in
analyzeAllium(the lanes #28 is about). The separate process-level lanes behindallium analyse(findProcessCompletenessIssueset al.) are unchanged; if they show the same class of false positive they can reusemaskCommentsAndStringsin a follow-up.🤖 Generated with Claude Code