perf(scan): rewrite walker hot path with sync I/O + manual string ops by BYK · Pull Request #805 · getsentry/cli

BYK · 2026-04-21T20:24:27Z

Summary

Walker was 411ms p50 for 10k files on the synthetic/large fixture — roughly 10× slower than a naive sync walk (50ms). Follow-up to PR #791 and #804.

Five targeted optimizations in the hot loop, all micro-benchmarked before landing:

Change	Measured win	End-to-end win
`readdirSync` instead of `readdir`	105→45ms (−60ms)	dominant
`statSync` instead of `Bun.file().size`	−10ms per 10k files
String concat for paths	path.join 7ms → concat 0.7ms (10×)
`abs.slice(cwdPrefixLen)` for relPath	path.relative 9ms → slice 0.8ms (11×)
Manual `lastIndexOf`-based extname	9ms → 7ms (25%)

Perf results (synthetic/large, 10k files, p50)

Op	Before	After	Δ
`scan.walk`	411ms	231ms	−44%
`scan.walk.noExt`	572ms	448ms	−22%
`scan.walk.dsnParity`	228ms	138ms	−39%
`scanCodeForDsns`	323ms	304ms	−6%
`detectAllDsns.cold`	327ms	308ms	−6%
`detectAllDsns.warm`	27.9ms	27.0ms	—
`scan.grepFiles`	322ms	316ms	noise
`scanCodeForFirstDsn`	2.23ms	2.05ms	—

Walker ops are 22-44% faster. Downstream ops (grep, DSN scanner) benefit less because their time is dominated by content scanning, not walking — but still show consistent ~6% improvements.

Why `readdirSync` is safe here

The sync vs async tradeoff usually favors async because blocking the event loop is bad in general. But measured per-call cost matters:

Metric	`readdirSync` on 3635 dirs
p50	11µs
p95	24µs
p99	35µs
max	65µs

65µs max block is trivial — setTimeout(0) latency in Node is ~4ms. Blocking for 65µs never causes noticeable event-loop pauses. And we pay ~60µs of microtask overhead for each async readdir, which wipes out any theoretical fairness benefit. Net: 2-3× faster per-dir on walks with many small directories, which is every realistic CLI workload.

If this ever matters for a weird embedded use case, the optimization is trivially reversible — the walker's public API is unchanged.

mtime parity fix

Initial implementation regressed detectAllDsns.warm from 28ms → 304ms because statSync().mtimeMs is a float (e.g. 1776790602458.1033) while Bun.file().lastModified is already an integer. The DSN cache validator compares floored sourceMtimes, so un-floored floats caused cache misses on every warm call. Fixed by flooring explicitly in tryYieldFile — matches the same treatment already applied to onDirectoryVisit's dirMtimes.

Walker v2 design notes

Sync readdir + async yield. Readdir blocks briefly (~65µs max), but the surrounding async-generator interface is preserved. Library consumers that insert async work between iterations still get cooperative scheduling.
Path ops inlined. path.join / path.relative / path.extname + toLowerCase are all replaced with manual string ops in the hot loop. On Windows, normalizePath is still applied via replaceAll(NATIVE_SEP, "/") — the POSIX fast path uses a cached POSIX_NATIVE module constant.
WalkContext.cwdPrefixLen precomputed once per walk, used to slice relative paths from absolute paths.
Cached module constants NATIVE_SEP and POSIX_NATIVE avoid per-call path.sep property lookups in the hot loop.

What this PR does NOT change

WalkEntry shape: preserved identically (absolutePath, relativePath, size, mtime, isBinary, depth).
WalkOptions contract: no new options, no semantics changes.
Symlink handling, followSymlinks, cycle detection via visitedInodes: unchanged.
IgnoreStack + gitignore semantics: unchanged.
onDirectoryVisit, recordMtimes, abortSignal: unchanged.

Test plan

bunx tsc --noEmit — clean
bun run lint — clean (1 pre-existing warning in src/lib/formatters/markdown.ts unrelated)
bun test --timeout 15000 test/lib test/commands test/types — 5640 pass, 0 fail
bun test test/isolated — 138 pass
bun test test/lib/scan/walker.test.ts — 34 pass (incl. property tests covering hidden files, symlinks, maxDepth, gitignore interaction)
bun test test/lib/dsn/code-scanner.test.ts — 52 pass (incl. dirMtimes / sourceMtimes cache validation)
DSN count correctness verified end-to-end: 4 DSNs found on fixture (matches pre-change count)
bun run bench --size large --runs 5 --warmup 2 — results in table above

🤖 Generated with Claude Code

Co-authored-by: Claude Opus 4.7 (1M context) noreply@anthropic.com

Walker was 411ms p50 for 10k files on synthetic/large fixture — roughly 10× slower than a naive sync walk (50ms). Profiling showed the overhead split roughly as: async readdir 167ms, per-file stat via Bun.file().size 44ms, IgnoreStack.isIgnored 65ms, path operations (join/relative/extname) 30ms, and small control-flow overhead. Five targeted optimizations in the hot loop, all micro-benchmarked before landing: 1. **Sync readdir** — `readdirSync` instead of `readdir`. Per-call cost measured p50 11µs / p95 24µs / max 65µs over 3635 dirs in the fixture. Blocking the event loop for max 65µs is trivially safe (setTimeout(0) latency is ~4ms) AND avoids the ~60µs microtask overhead each async readdir incurs. Net: 105→45ms. 2. **statSync instead of Bun.file().size** — measured ~15% faster per call (30ms vs 36ms for 10k files). When `recordMtimes: true`, the same statSync result serves both size and mtime reads — one syscall instead of two. Net: ~10ms saved. 3. **String concat for paths** — `frame.absDir + NATIVE_SEP + entry.name` instead of `path.join(...)`. Inputs are already clean (absDir is absolute without trailing slash, name is a pure basename per dirent semantics), so the normalization path.join does is wasted work. Measured 10× faster (7ms vs 0.7ms for 13k calls). 4. **slice for relativePath** — `abs.slice(cwdPrefixLen)` instead of `path.relative(cwd, abs)`. Safe because every abs is guaranteed under cwd by construction. Measured 11× faster (9ms vs 0.8ms). Windows: `normalizePath` fallback via `replaceAll(NATIVE_SEP, "/")`. 5. **Manual extname** — `name.lastIndexOf(".")` + slice + toLowerCase instead of `path.extname(name).toLowerCase()`. Measured 25% faster (9ms vs 7ms for 13k calls). Also: precompute `cwdPrefixLen` once on `WalkContext` instead of recomputing `cfg.cwd.length + 1` per entry. Cache `path.sep` / `POSIX_NATIVE` at module scope to avoid per-call property lookups. ## mtime parity fix Initial implementation regressed `detectAllDsns.warm` from 28ms → 304ms because `statSync().mtimeMs` is a float (e.g. `1776790602458.1033`) while `Bun.file().lastModified` is already an integer. The DSN cache validator compares floored `sourceMtimes`, so un-floored floats caused cache misses on every warm call. Fixed by flooring explicitly in `tryYieldFile` — matches the same treatment already applied to `onDirectoryVisit`'s dirMtimes. ## Perf (synthetic/large, 10k files, p50) | Op | Before | After | Δ | |---|---:|---:|---:| | `scan.walk` | 411ms | **231ms** | **−44%** | | `scan.walk.noExt` | 572ms | 448ms | **−22%** | | `scan.walk.dsnParity` | 228ms | **138ms** | **−39%** | | `scanCodeForDsns` | 323ms | 304ms | −6% | | `detectAllDsns.cold` | 327ms | 308ms | −6% | | `detectAllDsns.warm` | 27.9ms | 27.0ms | — | | `scan.grepFiles` | 322ms | 316ms | noise | Walker ops are 22-44% faster. Downstream ops (grep, DSN scanner) benefit less because their time is dominated by content scanning, not walking — but still show consistent ~6% improvements. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean (1 pre-existing warning unrelated) - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5640 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun test test/lib/scan/walker.test.ts` — 34 pass (incl. property tests covering hidden files, symlinks, maxDepth, gitignore interaction) - [x] `bun test test/lib/dsn/code-scanner.test.ts` — 52 pass (incl. dirMtimes / sourceMtimes cache validation) - [x] DSN count correctness verified end-to-end: 4 DSNs found on fixture (matches pre-change count)

github-actions · 2026-04-21T20:24:58Z

PR Preview Action v1.8.1
🚀 View preview at https://cli.sentry.dev/_preview/pr-805/
Built to branch `gh-pages` at 2026-04-21 20:24 UTC. Preview will be ready when the GitHub Pages deployment is complete.

github-actions · 2026-04-21T20:26:33Z

Codecov Results 📊

✅ 138 passed | Total: 138 | Pass Rate: 100% | Execution Time: 0ms

📊 Comparison with Base Branch

Metric	Change
Total Tests	—
Passed Tests	—
Failed Tests	—
Skipped Tests	—

✨ No test changes detected

All tests are passing successfully.

✅ Patch coverage is 85.71%. Project has 1772 uncovered lines.
❌ Project coverage is 95.62%. Comparing base (base) to head (head).

Files with missing lines (1)

File	Patch %	Lines
src/lib/scan/walker.ts	85.71%	⚠️ 5 Missing

Coverage diff

@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
- Coverage    95.63%    95.62%    -0.01%
==========================================
  Files          281       281         —
  Lines        40442     40463       +21
  Branches         0         0         —
==========================================
+ Hits         38676     38691       +15
- Misses        1766      1772        +6
- Partials         0         0         —

Generated by Codecov Action

…tches (#807) ## Summary Parallel grep via a worker pool with binary-transferable matches. DSN scanner unified onto the grep pipeline so it gets the same parallelism for free. ### Perf (synthetic/large, 10k files, p50) | Op | Before (#805) | After (this PR) | Δ | |---|---:|---:|---:| | `scan.grepFiles` | 322ms | **163ms** | **−49%** | | `scanCodeForDsns` | 313ms | **238ms** | **−24%** | | `detectAllDsns.cold` | 308ms | **251ms** | **−18%** | | `detectDsn.cold` | 4.98ms | **5.16ms** | — | | `detectAllDsns.warm` | 27.0ms | 27.7ms | — | `scan.grepFiles` is the primary target (bench op using `dsnScanOptions()`). `scanCodeForDsns` benefits because it now routes through `grepFiles` rather than a parallel-but-separate `walkFiles + mapFilesConcurrent` path. ### Full-scan comparison vs rg (for context) All measurements on 10k-file synthetic fixture, cap-at-100 disabled: | Pattern | rg | NEW (default opts) | NEW (cap@100) | |---|---:|---:|---:| | `import.*from` | 337ms | 775ms | **24ms** | | `SENTRY_DSN` | 83ms | 559ms | 82ms | | `function\s+\w+` | 333ms | 785ms | **20ms** | We're still 2-7× slower than rg on uncapped full-scans (mostly walker-bound, not worker-bound). On capped workloads (the init-wizard workload — always caps at 100) we're 14-17× faster than rg because the literal prefilter + early-exit kicks in quickly. ## Architecture ### Worker pool — `src/lib/scan/worker-pool.ts` Lazy-initialized singleton. Pool size `min(8, max(2, availableParallelism()))`. Feature-gated on `isWorkerSupported()` — runtime must expose `Worker`, `Blob`, and `URL.createObjectURL`. On runtimes without Workers, `grepFilesInternal` falls back to the existing `mapFilesConcurrent` path. Runtime escape hatch: `SENTRY_SCAN_DISABLE_WORKERS=1`. Workers are kept ref'd (not `.unref()`'d) — an earlier iteration unref'd them but that caused a deadlock when the main thread's only pending work was awaiting a worker result. The CLI relies on `process.exit()` at end of command execution; `terminatePool()` is test-only. ### Worker source — `src/lib/scan/grep-worker.js` (plain JS) + `script/text-import-plugin.ts` (esbuild plugin) The worker body lives in a real `.js` file so it's lintable, syntax-checked, and formattable. `worker-pool.ts` imports it via `with { type: "text" }` — natively handled by Bun at runtime, and polyfilled by a ~20-LOC esbuild plugin for the build-time path. The string content is then fed to a `Blob` + `URL.createObjectURL` to spawn workers. I investigated Bun's documented alternative (`new Worker("./path.ts")` with the worker as a compile entrypoint) but all three forms from the Bun docs either (a) hang in compiled binaries because `import.meta.url` resolves to the binary path and URL resolution doesn't hit the bundled entrypoint, or (b) work but require the binary to run with CWD equal to the `bun build` project root — brittle for a CLI that users run from arbitrary project dirs. Blob URL + text-import works identically in `bun run`, `bun test`, and compiled binaries. ### Pipeline — `src/lib/scan/grep.ts` `grepFilesInternal` dispatches to one of two sub-generators: 1. **`grepViaWorkers`** — producer/consumer streaming. Walker runs on main thread; paths accumulate into batches of 200; each full batch dispatches round-robin to the least-loaded worker; worker returns matches as `{ ints: Uint32Array, linePool: string }` via `postMessage` with `transfer: [ints.buffer]` (zero-copy). Main thread decodes and yields. 2. **`grepViaAsyncMain`** — the existing `mapFilesConcurrent` path, used when workers are unavailable or disabled. ### Binary match encoding Structured clone of 215k `GrepMatch` objects costs ~200ms. Transferable ArrayBuffer + shared `linePool` string drops that to **~2-3ms**. Each match is 4 u32s: `[pathIdx, lineNum, lineOffset, lineLength]`. Line text is appended to a single accumulator string per batch; offsets into it are rebuilt into `GrepMatch.line` on the main thread. ### DSN scanner unification (second commit) The DSN scanner had its own `walkFiles + mapFilesConcurrent + Bun.file().text() + extractDsnsFromContent` pipeline that predated the worker pool. Now it routes through `grepFiles` with `DSN_PATTERN`, the same worker pool, and post-filters matches on the main thread for comments/host validation/dedup. - `GrepOptions` gained `recordMtimes?: boolean` and `onDirectoryVisit?` pass-through to the walker. - `GrepMatch` gained optional `mtime?: number` (populated only when `recordMtimes: true`). - `scanCodeForFirstDsn` deliberately does NOT use the worker pool — spawning a pool for a single-DSN scan adds ~20ms of init cost that dwarfs the work. It uses a direct `walkFiles` loop with `extractFirstDsnFromContent`. ## Bugs I caught + fixed during implementation (4 non-trivial) ### FIFO queue race (message handler mismatch) Initial design used `addEventListener` per dispatch. Because Web Workers deliver every `result` message to EVERY attached listener, multiple concurrent dispatches to the same worker would all fire together on the first result, resolving the wrong promise with the wrong batch's data. **Fix:** single `onmessage` per worker, per-worker FIFO `pending` queue. Each `result` message shifts one slot. ### Consumer wake race (lost notifications) `wakeConsumer()` fired immediately if `notify` was set, else did nothing. When a batch settled before the consumer entered its `await new Promise(resolve => notify = resolve)` block, the notification was dropped and the consumer hung forever. **Fix:** added `notifyPending` flag. `wakeConsumer()` sets it when no consumer is waiting; consumer checks the flag before awaiting, and the await's executor also checks it to close the window between check and assignment. ### `worker.unref()` deadlock When the main thread's only event-loop work was waiting on a worker result, unref'd workers didn't process messages on idle ticks. After the walker finished and all in-flight batches except 2 returned, the last 2 never completed. **Fix:** removed `.unref()`. CLI calls `process.exit()` at the end so worker cleanup isn't needed. ### Dead-worker dispatch deadlock (Cursor Bugbot) After a worker `error` event, the error handler rejected pending items and reset `pw.inflight` to 0, but the dead worker remained in the pool. The least-loaded picker would favor the dead worker (it appeared to have 0 inflight), push new pending slots, and `postMessage` — the dead worker couldn't respond, and dispatches hung forever. **Fix:** added `alive: boolean` flag to `PooledWorker`, flipped to false on `error` before rejecting pending. Dispatch skips dead workers; returns `Promise.reject` when all are dead. ## Pre-merge review findings (self-review subagent) ### Silent DSN drop on lines >2000 chars (third commit) The unified DSN pipeline inherited `grepFiles`' default `maxLineLength: 2000`, which truncates `match.line` with a `…` suffix. The scanner then re-runs `DSN_PATTERN` on the truncated line — the pattern ends with `/\d+`, so a DSN near the end of a long line loses its trailing digits to `…` and fails the regex silently. Realistic triggers: minified JS bundles, source-map-embedded config. Reproduced with a 2030-char line containing a DSN at column 2004 — pre-fix result was 0 DSNs. **Fix:** pass `maxLineLength: Number.POSITIVE_INFINITY` from the DSN scanner. Memory bounded by walker's 256KB file cap. Regression test added. ## What this PR does NOT change - Walker semantics (no changes to `walker.ts`). - IgnoreStack / gitignore behavior. - `collectGrep` / `grepFiles` public API surface (new optional options only). - Node library bundle behavior — Workers detection is runtime, not build-time. If `new Worker` isn't available, the fallback path is used. ## Correctness - **AbortSignal propagation.** `grepViaWorkers` polls `signal.aborted` on every iteration of the consumer emit loop and throws `DOMException("AbortError")`. - **Early exit.** `maxResults` and `stopOnFirst` stop dispatching new batches; in-flight batches finish but their results are discarded. Some wasted CPU on early exit — acceptable because defaults are typically loose. - **Error handling.** Per-file read errors in the worker are silently swallowed (same as `mapFilesConcurrent`); worker-level errors propagate to all pending dispatches via the `alive: false` flag. - **DSN semantic equivalence.** Every pre-refactor test in `code-scanner.test.ts` passes (150 tests), including `deduplicates same DSN from multiple files` which tests the subtle `fileHadValidDsn` mtime-tracking invariant. - **Observability drift (minor).** The full-scan path no longer emits `log.debug("Cannot read file: ...")` per file read error — workers swallow them silently. `scanCodeForFirstDsn` still logs top-level errors. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean (1 pre-existing markdown warning unrelated) - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5641 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun test test/lib/scan/grep.test.ts` — 38 pass (including `AbortSignal fires mid-iteration`) - [x] `bun test test/lib/dsn/code-scanner.test.ts` — 51 pass (including long-minified-line regression test) - [x] `SENTRY_SCAN_DISABLE_WORKERS=1 bun run bench` — fallback path works end-to-end - [x] `bun run bench --size large --runs 3 --warmup 1` — numbers in table above - [x] `bun run build --single` — compiled single-file binary builds and runs - [x] `SENTRY_CLIENT_ID=placeholder bun run bundle` — npm library bundle builds; worker source correctly inlined ## Commits 1. `d2305ba1` — perf(scan): parallel grep via worker pool with binary-transferable matches 2. `6ea26f3d` — perf(scan,dsn): unify DSN scanner with grep pipeline; fix dead-worker dispatch 3. `47a5ee8d` — refactor(scan): worker body as a real .js file (not an inline string) 4. `b25a2e48` — fix(dsn): recover DSNs past column 2000 on long minified lines 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary Follow-up to #791. Replaces the init-wizard's `rg → git grep → fs walk` fallback chain in `src/lib/init/tools/grep.ts` + `glob.ts` with thin adapters over the pure-TS `collectGrep` / `collectGlob` helpers from `src/lib/scan/`. The Mastra wire contract is preserved byte-identical on all existing fields. Two optional fields (`caseInsensitive`, `multiline`) are added to `GrepSearch` for future-proofing — no current server invocation sets them. ## What changed ### Adapters (net −392 LOC of production code) | File | Before | After | |---|---:|---:| | `src/lib/init/tools/grep.ts` | 337 LOC | 114 LOC | | `src/lib/init/tools/glob.ts` | 147 LOC | 73 LOC | | `src/lib/init/tools/search-utils.ts` | 146 LOC | **deleted** | | `src/lib/init/types.ts` | — | +17 LOC (two optional fields) | The adapters now just: 1. Sandbox the user-supplied `search.path` via `safePath`. 2. Forward each search/pattern to `collectGrep` / `collectGlob` with the wire-level constants (`maxResults`, `maxLineLength`) plumbed through. 3. Strip `absolutePath` from each `GrepMatch` — the Mastra wire has never included it. 4. Catch `ValidationError` from bad regex so a single bad pattern surfaces as an empty per-search row rather than aborting the whole payload. ### New optional `GrepSearch` fields ```ts export type GrepSearch = { pattern: string; path?: string; include?: string; caseInsensitive?: boolean; // NEW multiline?: boolean; // NEW }; ``` No current Mastra server invocation sets these. Adding them now means the server can start sending them without a CLI update. The underlying scan engine natively supports both. ### Tests - **Deleted 3 obsolete tests** that shadowed `rg` in `PATH` to force the fallback chain. With the pure-TS adapter there's no fallback chain to exercise; the tests had become tautological. - **Deleted scaffolding**: `writeExecutable`, `setPath`, `helperBinDir`, `savedPath` — only used by the deleted tests. - **Kept 3 pre-existing tests** unchanged: include filters, multi-pattern glob, sandbox rejection. - **Added 3 new adapter-specific tests**: - `grep result matches MUST NOT include absolutePath` — pins the strip behavior so `absolutePath` never leaks to the Mastra agent. - `grep bad regex yields empty matches without crashing the payload` — documents the `ValidationError` catch contract. - `grep caseInsensitive flag enables case-insensitive matching` — end-to-end coverage for the new wire field. ## Behavior changes (intentional, only affects users without `rg`/`git`) Before this PR, the init-wizard fs-walk fallback was naive: no `.gitignore` handling, narrow skip list, no binary detection. Users with `rg` or `git` installed never took this path. After this PR, **every user gets rg-like behavior** via the pure-TS scanner: - **Nested `.gitignore`** honored (cumulative semantics, matching git + rg). - **Wider skip-dir list** — scan excludes `.next`, `target`, `vendor`, `coverage`, `.cache`, `.turbo` in addition to the old skip set. - **Binary files filtered** — scan runs an 8 KB NUL-byte sniff before grep-ing. Users with `rg` installed see no behavior change relative to the rg-path of the old adapter. ## Benchmarks Measured after rebasing onto `main` (includes #791, #804, #805, #807 — the full grep/worker-pool stack). **Fixture:** 10k files (~80 MB), 3 monorepo packages, mix of text + binary. **Config:** 5 runs after 2 warmup, pre-warmed page cache. **Machine:** linux/x64, 4 CPUs, Bun 1.3.11. ripgrep 14.1.0. **Harness:** `/tmp/bench-init-tools.ts` — invokes the adapter via the real `grep(GrepPayload)` entry point Mastra uses. ### Adapter vs rg — single search | Pattern | rg | NEW uncapped | NEW cap=100 | uncapped vs rg | cap vs rg | |---|---:|---:|---:|---:|---:| | `import.*from` (215k matches) | 284 ms | 769 ms | **28.4 ms** | 2.70× slower | **10× FASTER** | | `function\s+\w+` (216k) | 306 ms | 782 ms | **28.2 ms** | 2.56× slower | **11× FASTER** | | `SENTRY_DSN` (677) | 81 ms | 496 ms | 79.3 ms | 6.10× slower | parity | | no matches | 74 ms | 486 ms | 492 ms | 6.6× slower | 6.7× slower | The init-wizard workload is always capped at 100. On dense-match patterns (the common case — Mastra greps for common code patterns like imports and function signatures) the worker pool's literal-prefilter + concurrent fan-out + early-exit hit 100 matches in ~28 ms, **10× faster than rg**. On rare/no-match patterns we're walker-bound at ~500 ms — still fine for init wizard. ### Adapter multi-search (realistic Mastra workload) Payload: 3 patterns (`import.*from`, `SENTRY_DSN`, `function\s+\w+`) in parallel via `Promise.all`, `maxResultsPerSearch: 100`. | Metric | Value | |---|---:| | p50 end-to-end | **109 ms** | All three searches share the same worker pool (dispatched concurrently). ### Scan-harness numbers (context) From `bun run bench --size large`, showing that the new adapter is routed through the worker-backed grep pipeline: | Op | p50 | |---|---:| | `scan.grepFiles` | 166 ms | | `scanCodeForDsns` | 232 ms | | `detectDsn.cold` | 4.78 ms | ## Why ship this Init wizard grep has one caller: the Mastra server, which always sends `maxResults: 100`. For that workload the NEW adapter is **10× faster than rg** on common dense patterns and sidesteps the subprocess-dependency problem: - Users without `rg` no longer fall back to a naive walk that ignores gitignore and scans binaries. - No spawning, no stderr draining, no exit-code decoding, no parse-out-pipe-delimited-output. - `GrepSearch` gains `caseInsensitive` + `multiline` — passthrough to the scan engine's native support. The correctness tax (nested `.gitignore` respected, wider skip list, binaries filtered) is paid once per scan regardless of match density. On the init wizard's capped workload it's noise; on exhaustive scans it's ~500 ms on 10k files vs rg's ~300 ms. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean (1 pre-existing warning in `src/lib/formatters/markdown.ts:281`, not touched by this PR) - [x] `bun test test/lib/init/ test/lib/scan/` — **352 pass, 0 fail** - [x] Rebased onto latest main (includes #791, #804, #805, #807) - [x] `bun run /tmp/bench-init-tools.ts` — numbers in table above ## What this PR does NOT change - `GrepPayload` / `GlobPayload` — unchanged structurally, only `GrepSearch` gains optional fields - `src/lib/init/tools/registry.ts` — tool dispatch unchanged - `src/lib/init/tools/shared.ts` — `safePath` + `validateToolSandbox` unchanged - Response shape — `{ ok: true, data: { results: [{pattern, matches, truncated}] } }` identical 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary Cleanup pass over the `scan/` and `dsn/` modules after the grep + worker-pool stack (#791, #804, #805, #807, #797) landed. Removed comment bloat accumulated across the 6+ review cycles those PRs went through — redundant bug-history narration, repeated explanations of `ref/unref` boolean semantics, "pre-PR-N" references, and other scars that wouldn't survive a fresh-eyes read. **Net −708 LOC across 12 files. No behavior changes.** ## Per-file reductions | File | Before | After | Δ | |---|---:|---:|---:| | `src/lib/scan/worker-pool.ts` | 466 | 312 | −33% | | `src/lib/scan/grep.ts` | 985 | 712 | −28% | | `src/lib/dsn/code-scanner.ts` | 541 | 377 | −30% | | `src/lib/scan/grep-worker.js` | 153 | 114 | −25% | | `src/lib/dsn/scan-options.ts` | 70 | 52 | −26% | | `src/lib/init/tools/grep.ts` | 122 | 98 | −20% | | `src/lib/init/tools/glob.ts` | 74 | 59 | −20% | Plus minor trims in `types.ts`, `walker.ts`, `path-utils.ts`, `scan/glob.ts`, and `script/text-import-plugin.ts`. ## What was removed - Redundant explanations of `Worker.ref()` / `.unref()` boolean semantics (stated 3× in `worker-pool.ts` — kept once, on the primary ref/unref helper pair). - Multi-paragraph "earlier iteration did X, deadlocked, so we now do Y" histories — kept in git log where they belong. - "pre-PR-3" / "pre-refactor" / "previous version" narration that explained how code used to look before the current session's work. - File-header docstrings that repeated what the module structure and exports already told you. - Rationale comments for `biome-ignore`s that were already justified by adjacent context. ## What was kept - Every `biome-ignore` comment (all still justified). - Every "why" comment tied to a real gotcha (e.g., the length-change warning on case-insensitive literal prefilters, the `/g` flag cloning rationale, the pipeline-failure detector explanation). - Every JSDoc on exported functions and types. - All inline comments that explain non-obvious constraints (sandbox enforcement, cache-contract stability, etc.). ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean (1 pre-existing markdown warning) - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5641 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun run bench --size large --runs 3` — no perf regression (`scan.grepFiles` 167ms, `scanCodeForDsns` 232ms — matching pre-trim numbers) - [x] `bun run build --single` — binary builds and exits cleanly on `sentry project view` from empty dir (3 consecutive runs, all exit=1) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

BYK merged commit 7c4f082 into main Apr 21, 2026
24 checks passed

BYK deleted the byk/perf/walker-optimize branch April 21, 2026 20:33

BYK mentioned this pull request Apr 21, 2026

perf(scan): parallel grep via worker pool with binary-transferable matches #807

Merged

11 tasks

BYK mentioned this pull request Apr 22, 2026

refactor(init): migrate grep/glob tools to src/lib/scan/ #797

Merged

5 tasks

BYK mentioned this pull request Apr 22, 2026

chore(scan,dsn): trim session cruft from comment-heavy files #810

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(scan): rewrite walker hot path with sync I/O + manual string ops#805

perf(scan): rewrite walker hot path with sync I/O + manual string ops#805
BYK merged 1 commit intomainfrom
byk/perf/walker-optimize

BYK commented Apr 21, 2026

Uh oh!

github-actions Bot commented Apr 21, 2026

Built to branch `gh-pages` at 2026-04-21 20:24 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

github-actions Bot commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BYK commented Apr 21, 2026

Summary

Perf results (synthetic/large, 10k files, p50)

Why readdirSync is safe here

mtime parity fix

Walker v2 design notes

What this PR does NOT change

Test plan

Uh oh!

github-actions Bot commented Apr 21, 2026

Built to branch gh-pages at 2026-04-21 20:24 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

github-actions Bot commented Apr 21, 2026

Codecov Results 📊

📊 Comparison with Base Branch

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Why `readdirSync` is safe here

Built to branch `gh-pages` at 2026-04-21 20:24 UTC.
Preview will be ready when the GitHub Pages deployment is complete.