perf(scan): parallel grep via worker pool with binary-transferable matches by BYK · Pull Request #807 · getsentry/cli

BYK · 2026-04-21T22:01:13Z

Summary

Parallel grep via a worker pool with binary-transferable matches. DSN scanner unified onto the grep pipeline so it gets the same parallelism for free.

Perf (synthetic/large, 10k files, p50)

Op	Before (#805)	After (this PR)	Δ
`scan.grepFiles`	322ms	160ms	−50%
`scanCodeForDsns`	313ms	237ms	−24%
`detectAllDsns.cold`	308ms	251ms	−18%
`detectDsn.cold`	4.98ms	4.60ms	—
`detectAllDsns.warm`	27.0ms	27.2ms	—

scan.grepFiles is the primary target (bench op using dsnScanOptions()). scanCodeForDsns benefits because the DSN scanner was unified onto the grep pipeline during this PR (commit 2) and now routes through the same worker pool rather than its pre-existing walkFiles + mapFilesConcurrent path.

Cumulative journey through the session series (#791, #804, #805, this PR): scan.grepFiles went from ~1300ms → 160ms (8× faster, ~95% reduction).

Full-scan comparison vs rg (for context)

All measurements on 10k-file synthetic fixture:

Pattern	rg	NEW (default opts)	NEW (cap@100)
`import.*from`	337ms	775ms	24ms
`SENTRY_DSN`	83ms	559ms	82ms
`function\s+\w+`	333ms	785ms	20ms

We're still 2-7× slower than rg on uncapped full-scans (walker-bound, not worker-bound). On capped workloads — the init-wizard workload, which always caps at 100 — we're 14-17× faster than rg because the literal prefilter + early-exit kicks in quickly.

Architecture

Worker pool — `src/lib/scan/worker-pool.ts`

Lazy-initialized singleton. Pool size min(8, max(2, availableParallelism())). Feature-gated on isWorkerSupported() — runtime must expose Worker, Blob, and URL.createObjectURL. On runtimes without Workers, grepFilesInternal falls back to the existing mapFilesConcurrent path. Runtime escape hatch: SENTRY_SCAN_DISABLE_WORKERS=1.

Ref/unref lifecycle: Workers are .unref()'d at spawn so an idle pool doesn't hold the event loop open. Each dispatch() calls ref() before posting work; the message result handler calls unref() when the worker's inflight count drops back to zero. This gives us:

Clean CLI exit: once all dispatches settle, workers are idle + unref'd, the loop drains, process exits naturally.
No mid-pipeline deadlock: ref'd workers during active work keep message events flowing (idle unref'd workers get starved by Bun's idle-tick bypass).

Worker.ref() / .unref() are idempotent booleans (NOT reference-counted) — hence the inflight-zero guard at every unref site.

Worker source — `src/lib/scan/grep-worker.js` + `script/text-import-plugin.ts`

The worker body lives in a real .js file so it's lintable, syntax-checked, and formattable (catching lint issues the original inline-string approach silently hid). worker-pool.ts imports it via with { type: "text" } — natively handled by Bun at runtime, and polyfilled by a ~20-LOC esbuild plugin for the build-time path. The string content is then fed to a Blob + URL.createObjectURL to spawn workers.

I investigated Bun's documented alternative (new Worker("./path.ts") with the worker as a compile entrypoint) but all three forms from the Bun docs either (a) hang in compiled binaries because import.meta.url resolves to the binary path and URL resolution doesn't hit the bundled entrypoint, or (b) work but require the binary to run with CWD equal to the bun build project root — brittle for a CLI users run from arbitrary dirs. Blob URL + text-import works identically in bun run, bun test, and compiled binaries.

Pipeline — `src/lib/scan/grep.ts`

grepFilesInternal dispatches to one of two sub-generators:

grepViaWorkers — producer/consumer streaming. Walker runs on main thread; paths accumulate into batches of 200; each full batch dispatches round-robin to the least-loaded worker; worker returns matches as { ints: Uint32Array, linePool: string } via postMessage with transfer: [ints.buffer] (zero-copy). Main thread decodes and yields.
grepViaAsyncMain — the existing mapFilesConcurrent path, used when workers are unavailable or explicitly disabled.

Binary match encoding

Structured clone of 215k GrepMatch objects costs ~200ms. Transferable ArrayBuffer + shared linePool string drops that to ~2-3ms. Each match is 4 u32s: [pathIdx, lineNum, lineOffset, lineLength]. Line text is appended to a single accumulator string per batch; offsets into it are rebuilt into GrepMatch.line on the main thread.

DSN scanner unification

The DSN scanner previously had its own walkFiles + mapFilesConcurrent + Bun.file().text() + extractDsnsFromContent pipeline that predated the worker pool. Now it routes through collectGrep with DSN_PATTERN, the same worker pool, and post-filters matches on the main thread for comments / host validation / dedup.

GrepOptions gained recordMtimes?: boolean and onDirectoryVisit? pass-through to the walker.
GrepMatch gained optional mtime?: number (populated only when recordMtimes: true).
collectGrep returns { matches, stats } and the DSN scanner uses stats.filesRead for telemetry — same semantic as scanCodeForFirstDsn's walker-entry count.
scanCodeForFirstDsn deliberately does NOT use the worker pool — spawning a pool for a single-DSN scan adds ~20ms of init cost that dwarfs the work. It uses a direct walkFiles loop with extractFirstDsnFromContent.

Bugs caught + fixed during implementation

Over the session + pre-merge review + 6 bot review rounds, these non-trivial issues were found and fixed:

FIFO message-handler race — addEventListener per dispatch caused result messages to fire all pending handlers, resolving the wrong dispatch's promise. Fix: single onmessage per worker + per-worker FIFO pending queue shifted on each result.
Consumer wake race — wakeConsumer() fired before consumer set notify would lose the wakeup. Fix: latched notifyPending flag that bridges the window between check and assignment.
worker.unref() deadlock — unconditionally unref'd workers didn't process messages on idle ticks, hanging the pipeline when the main thread was only awaiting worker results. Fix: ref/unref balancing per dispatch (unref only when inflight === 0).
Dead-worker dispatch hang — after a worker error event, the least-loaded picker still routed work to it because inflight had been reset to 0. Fix: alive: boolean flag checked in dispatch; skip dead workers, throw when all are dead.
Boolean ref/unref mishandling — .ref() / .unref() are idempotent booleans, not reference-counted. Per-completion unrefs prematurely released the event-loop hold while sibling dispatches were still in flight. Fix: inflight-zero guards at every unref site.
CLI exit hang in compiled binary — removing .unref() entirely (from an earlier attempt to fix the deadlock) broke clean exit because ref'd-but-idle workers held the loop open. Caught by E2E on sentry project view. Fixed by the ref/unref balancing above.
DSN line truncation silently dropped DSNs past column ~2000 — grepFiles truncates match.line at maxLineLength (default 2000) with …; the DSN scanner's per-line re-match would fail when a DSN's trailing \d+ got replaced. Realistic trigger: minified JS bundles. Fix: pass maxLineLength: Number.POSITIVE_INFINITY from the DSN scanner.
filesCollected counted matches, not files — grepFiles emits one match per line, so a file with 3 DSN-matching lines inflated the counter 3×. Fix: use stats.filesRead (walker-yielded file count) from collectGrep instead.
Silent pipeline-wide worker failure — error handler swallowed dispatch rejections; if every batch failed, the consumer saw zero results and exited normally, leaving callers (notably the DSN cache layer) with false-negative empty results. Fix: track dispatchedBatches / failedBatches; throw in finally if all dispatches failed.
DSN telemetry semantic mismatch — dsn.files_scanned meant "files with validated DSNs" in scanCodeForDsns but "all walker-yielded files" in scanCodeForFirstDsn. Unified to walker-yielded files in both paths via stats.filesRead.
Latent readiness-failure FIFO violation — pending.pop() in the readiness-failure handler removed the wrong dispatch's slot when multiple dispatches were queued to the same worker. Fix: hold ourSlot reference per dispatch, identity-based removal via indexOf + splice.
Latent maxResults=0 race — failure-check ran before in-flight dispatches settled. Fix: await Promise.allSettled(dispatchPromises) before the check.

Correctness

Public API unchanged. grepFiles / collectGrep return the same shape; new options on GrepOptions are all optional.
AbortSignal propagation. grepViaWorkers polls signal.aborted on every iteration of the consumer emit loop and throws DOMException("AbortError") — matches the fallback path.
Early exit. maxResults and stopOnFirst stop dispatching new batches; in-flight batches finish but their results are discarded. Some wasted CPU on early exit — acceptable because defaults are typically loose.
Error handling. Per-file read errors in the worker are silently swallowed (same as mapFilesConcurrent); worker-level errors propagate via alive: false + pending-queue drain.
DSN semantic equivalence. Every pre-refactor test in code-scanner.test.ts passes (151 tests), including deduplicates same DSN from multiple files (subtle fileHadValidDsn mtime-tracking invariant) and a new long-minified-line regression test.
Observability drift (minor). The full-scan path no longer emits log.debug("Cannot read file: ...") per file read error — workers swallow them silently. scanCodeForFirstDsn still logs top-level errors.

Test plan

Commits (squashed)

d2305ba1 — perf(scan): parallel grep via worker pool with binary-transferable matches
6ea26f3d — perf(scan,dsn): unify DSN scanner with grep pipeline; fix dead-worker dispatch
47a5ee8d — refactor(scan): worker body as a real .js file (not an inline string)
b25a2e48 — fix(dsn): recover DSNs past column 2000 on long minified lines
bdcb3a25 — fix(scan): balance ref/unref per dispatch so CLI exits cleanly
872708ee — fix(scan,dsn): surface worker-pipeline failures + correct filesCollected count
fa3dc621 — fix(scan): respect boolean semantics of Worker.ref()/.unref()
dff0be1d — fix(dsn): unify dsn.files_scanned semantics across both scan paths
4e0b6b55 — fix(scan): close remaining latent worker-pool races

🤖 Generated with Claude Code

Co-authored-by: Claude Opus 4.7 (1M context) noreply@anthropic.com

…tches `scan.grepFiles` on 10k files drops from 322ms → 191ms p50 (−41%). Follow-up to PR #791 (literal prefilter), #804 (simplify prefilter to file gate), #805 (walker hot-path rewrite). The prior PRs left the walker at 231ms and the overall pipeline at ~1200ms for uncapped scans — the gap was dominated by per-file async orchestration in `mapFilesConcurrent` rather than actual work. Prototypes showed that running the per-file work synchronously inside a small pool of workers (4 on this box) collapses the orchestration cost without changing the public API. ## Architecture ### `src/lib/scan/worker-pool.ts` (~300 LOC) Lazy-initialized singleton pool (size `min(8, max(2, availableParallelism()))`). Feature-gated on `isWorkerSupported()` — runtime must expose `Worker`, `Blob`, and `URL.createObjectURL`. On runtimes without Workers (e.g., Node dist without `--experimental-worker-threads-*` flags), `grepFilesInternal` falls back to the existing `mapFilesConcurrent` path. Runtime escape hatch: `SENTRY_SCAN_DISABLE_WORKERS=1`. ### `src/lib/scan/grep-worker-source.ts` Worker source inlined as a JS string constant and loaded via `Blob` + `URL.createObjectURL`. **This is required for `bun build --compile` single-file binaries** — filesystem-based `new Worker(url)` hangs at runtime because the URL can't be resolved against the compiled binary's module map. Blob URLs sidestep that entirely. The worker is self-contained, uses `require("node:fs")` CommonJS-style, and has zero imports from local modules. ### `src/lib/scan/grep.ts` integration `grepFilesInternal` dispatches to one of two sub-generators: 1. `grepViaWorkers` — producer/consumer streaming. Walker runs on main thread; paths accumulate into batches of 200; each full batch dispatches round-robin to the least-loaded worker; worker returns matches as `{ ints: Uint32Array, linePool: string }` via `postMessage` with `transfer: [ints.buffer]` (zero-copy). Main thread decodes into `GrepMatch[]` lazily and yields. 2. `grepViaAsyncMain` — the existing `mapFilesConcurrent` path, used when workers are unavailable or disabled. ### Binary match encoding Structured clone of 215k `GrepMatch` objects costs ~200ms on this box. Transferable ArrayBuffer + shared `linePool` string drops that to ~2-3ms. Each match is 4 u32s: `[pathIdx, lineNum, lineOffset, lineLength]`. Line text is appended to a single accumulator string per batch; offsets into it are rebuilt into `GrepMatch.line` on the main thread via `linePool.slice(lineOffset, lineOffset + lineLength)`. ## Bugs caught + fixed during implementation ### FIFO queue race (message handler mismatch) Initial design used `addEventListener` per dispatch. Because Web Workers deliver every `result` message to EVERY attached listener, multiple concurrent dispatches to the same worker would all fire together on the first result — resolving the wrong promise with the wrong batch's data. When a handler shifted state to mark "done", subsequent dispatches would find their handlers already removed. **Fix:** single `onmessage` per worker, per-worker FIFO `pending` queue of `{resolve, reject}` slots. Each `result` message shifts one slot. Workers process `postMessage` FIFO (both Bun and Node guarantee this inside a single worker), so FIFO-order resolution is sound. ### Consumer wake race (lost notifications) `wakeConsumer()` fired immediately if `notify` was set, else did nothing. When a batch settled before the consumer entered its `await new Promise(resolve => notify = resolve)` block, the notification was dropped and the consumer hung forever. **Fix:** added `notifyPending` flag. `wakeConsumer()` sets it when no consumer is waiting; consumer checks the flag before awaiting, and the await's executor also checks it to close the window between check and assignment. ### `worker.unref()` deadlock Initial design called `.unref()` on each worker for clean CLI exit. But when the main thread's only event-loop work was waiting on a worker result, unref'd workers didn't process messages on idle ticks, deadlocking the pipeline. The symptom was visible in the bench — after the walker finished and all in-flight batches except 2 returned, the last 2 never completed. **Fix:** removed `.unref()`. The CLI calls `process.exit()` at the end so worker cleanup isn't needed for process termination. For library embeddings, callers can use `terminatePool()` explicitly. ## Perf (synthetic/large, 10k files, p50) | Op | Before (PR #805) | After | Δ | |---|---:|---:|---:| | `scan.grepFiles` | 322ms | **191ms** | **−41%** | | `detectAllDsns.cold` | 308ms | 297ms | −4% | | `scanCodeForDsns` | 313ms | 293ms | −6% | | `detectAllDsns.warm` | 27.0ms | 27.5ms | — | | `scan.walk` | 231ms | 263ms | noise | `scan.grepFiles` is the primary target. `scanCodeForDsns` doesn't use the worker pool (it uses the DSN scanner directly), so the ~20ms improvement there is walker variance. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean (1 pre-existing warning in `src/lib/formatters/markdown.ts` unrelated) - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5640 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun test test/lib/scan/grep.test.ts` — 38 pass (including `AbortSignal fires mid-iteration` which exercises the signal propagation through the producer/consumer pipeline) - [x] `bun test test/lib/dsn/code-scanner.test.ts` — 52 pass - [x] `SENTRY_SCAN_DISABLE_WORKERS=1 bun run bench` — verifies the fallback path still works end-to-end (360ms p50, matches pre-change behavior)

github-actions · 2026-04-21T22:01:45Z

PR Preview Action v1.8.1
🚀 View preview at https://cli.sentry.dev/_preview/pr-807/
Built to branch `gh-pages` at 2026-04-22 08:16 UTC. Preview will be ready when the GitHub Pages deployment is complete.

github-actions · 2026-04-21T22:03:23Z

Codecov Results 📊

✅ 138 passed | Total: 138 | Pass Rate: 100% | Execution Time: 0ms

📊 Comparison with Base Branch

Metric	Change
Total Tests	—
Passed Tests	—
Failed Tests	—
Skipped Tests	—

✨ No test changes detected

All tests are passing successfully.

❌ Patch coverage is 77.89%. Project has 2016 uncovered lines.
❌ Project coverage is 95.07%. Comparing base (base) to head (head).

Files with missing lines (3)

File	Patch %	Lines
src/lib/scan/worker-pool.ts	66.17%	⚠️ 68 Missing
src/lib/scan/grep.ts	84.15%	⚠️ 29 Missing
src/lib/dsn/code-scanner.ts	89.90%	⚠️ 10 Missing

Coverage diff

@@            Coverage Diff             @@
##          main       #PR       +/-##
==========================================
- Coverage    95.63%    95.07%    -0.56%
==========================================
  Files          281       283        +2
  Lines        40462     40926      +464
  Branches         0         0         —
==========================================
+ Hits         38694     38910      +216
- Misses        1768      2016      +248
- Partials         0         0         —

Generated by Codecov Action

… dispatch ## DSN scanner unification The DSN scanner had its own `walkFiles + mapFilesConcurrent + Bun.file().text() + extractDsnsFromContent` pipeline that predated the worker pool from PR #807. It was doing the same fundamental work as `grepFiles` (regex over file content + collect matches) but via a separate path with more orchestration overhead. Rewrote `scanCodeForDsns` to use `grepFiles` directly: - Pattern: `DSN_PATTERN` — grep's literal prefilter gates files by `http` (matches the old `HTTP_SCHEME_PROBE` fast path exactly). - Walker options: `dsnScanOptions()` + `recordMtimes: true` + `onDirectoryVisit` hook for cache-invalidation maps. - Per-match post-filter on the main thread: comment-line check, host validation, dedup, mtime tracking, package path inference. - Multi-DSN-per-line handling preserved via `extractDsnsOnLine` which re-runs the regex on `match.line`. `scanCodeForFirstDsn` deliberately does NOT route through the worker pool. It's the stopOnFirst fast path (first valid DSN wins, return early). Spawning a 4-worker pool for a workload that often completes after reading 1-2 files would add ~20ms of pool-init cost that dwarfs the work. Instead it uses a direct `walkFiles` loop with sync-ish per-file read + `extractFirstDsnFromContent`, which already benefits from the `/http/i` literal fast path. This keeps `detectDsn.cold` at ~3.5ms instead of regressing to ~20ms. ## New grep options: `recordMtimes` + `onDirectoryVisit` To support the DSN cache-invalidation maps, extended `GrepOptions`: - `recordMtimes: boolean` — when true, each `GrepMatch` includes the source file's floored `mtimeMs` on an optional `mtime` field. Walker already exposes `entry.mtime` via its own `recordMtimes` passthrough; the main thread captures mtimes per-batch before dispatching and the decoder attaches them indexed by `pathIdx`. Worker doesn't need to stat again (the main thread already did via `statSync`). - `onDirectoryVisit: (absDir, mtimeMs) => void` — pass-through to the walker's per-directory hook. Used for directory-level cache invalidation ("has a new file been added to this dir?"). Both are opt-in; when unused, `GrepMatch.mtime` is absent (not 0 — distinguishes "not asked" from "asked and happens to be 0"). ## Cursor Bugbot: dead-worker dispatch deadlock Cursor found a real correctness issue in the worker pool from #807: > After a worker `error` event, the error handler rejects all > pending items and resets `pw.inflight` to 0, but the dead worker > remains in the pool. The least-loaded selection in `dispatch` > will favor this worker (it appears to have 0 `inflight`), push > new pending slots, and call `postMessage` — but the dead worker > can't respond. Those dispatch promises hang forever, so > `inflightCount` in `grepViaWorkers` never reaches 0, deadlocking > the consumer's exit condition at > `walkerDone && inflightCount === 0`. Added `alive: boolean` to `PooledWorker`, flipped to false on the first `error` event before rejecting pending dispatches. The least-loaded picker now skips dead workers entirely and falls back to `Promise.reject(...)` when every worker is dead. Prevents the reported deadlock and ensures a pool-wide failure surfaces as an error instead of hanging. ## Perf (synthetic/large, 10k files, p50) | Op | Before unification | After unification | Δ | |---|---:|---:|---:| | `scanCodeForDsns` | 293ms | **234ms** | **−20%** | | `detectAllDsns.cold` | 297ms | **250ms** | **−16%** | | `detectDsn.cold` | 4.98ms | **3.45ms** | **−31%** | | `scanCodeForFirstDsn` | 2.07ms | 1.88ms | — | | `scan.grepFiles` | 191ms | 170ms | − | | `detectAllDsns.warm` | 27.5ms | 27.2ms | — | The full-scan paths (`scanCodeForDsns`, `detectAllDsns.cold`) are 20% faster because they now share the worker-pool infrastructure from PR #807. The `stopOnFirst` paths (`scanCodeForFirstDsn`, `detectDsn.cold`) are essentially unchanged (or slightly better) because they now use a tighter direct-walker loop. ## Correctness - `extractDsnsFromContent` still exported for `src/lib/dsn/detector.ts::isDsnStillPresent` (single-file cache verification) — unchanged. - `extractFirstDsnFromContent` still exported for `scanCodeForFirstDsn`'s per-file fast path — unchanged. - All DSN tests pass, including the tricky "deduplicates same DSN from multiple files" test which required tracking `fileHadValidDsn` separately from seen-set insertion to preserve the pre-refactor behavior of recording mtime for every file that contributed a validated DSN (even if the DSN was a duplicate). - `sourceMtimes` / `dirMtimes` shapes unchanged — the cache schema in `src/lib/db/dsn-cache.ts` doesn't need to change. - Minor observability drift: per-file `log.debug("Cannot read file: …")` no longer fires on individual read errors in the full-scan path (worker swallows them). The `scanCodeForFirstDsn` path still logs top-level errors. Accepted as minor regression. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean (1 pre-existing markdown warning) - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5640 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun test test/lib/dsn/code-scanner.test.ts` — 150 pass (including the "deduplicates same DSN from multiple files" regression test that caught the initial buggy implementation) - [x] `bun test test/lib/scan/grep.test.ts` — 38 pass - [x] `bun run bench --size large --runs 5 --warmup 2` — `scanCodeForDsns` 234ms p50 (−20% from 293ms)

Replaces `src/lib/scan/grep-worker-source.ts` (a TS module exporting the worker body as a string template literal) with a real `src/lib/scan/grep-worker.js` file that's imported via `with { type: "text" }`. The runtime behavior is unchanged — Bun and the esbuild plugin both resolve the attribute to the file contents as a string — but now the worker is a first-class file: lintable, syntax-checked, formattable, searchable. ## Why `grep-worker-source.ts` was functional but awkward. The string inside the template literal wasn't visible to Biome's lint rules, IDE navigation, or any other tooling that cares about real files. We caught several lint issues once the body became a real `.js` file (`noIncrementDecrement`, `useBlockStatements`, `useTemplate`) — real issues that had been silently skipped. The blob-URL + raw-text-import approach is the only form that works in all three of our target environments: - **`bun run` (dev):** Bun handles `with { type: "text" }` natively. - **`bun build --compile` (single-file CLI):** the esbuild plugin inlines the file contents as a string at bundle time; the compiled binary has no runtime file resolution. - **`bun bundle` (npm library `dist/index.cjs`):** same esbuild plugin, same inlined string. When Node consumers import the library, Workers are feature-gated on `isWorkerSupported()` and fall back to the async path. I explored Bun's documented alternative (`new Worker(new URL(...))` or `new Worker("./path.ts")` with the worker as an additional entrypoint to `bun build --compile`) but all three forms the docs suggest either (a) fail at runtime in compiled binaries because `import.meta.url` resolves to the binary path and URL resolution doesn't hit the bundled entrypoint, or (b) work but require the binary to run with CWD equal to the `bun build` project root, which is brittle. The Blob-URL + text-import approach has no such fragility. ## Changes - **New:** `src/lib/scan/grep-worker.js` — real plain-JS file containing the worker body (moved verbatim from the old string template; no logic changes). Pass-through of all existing features: literal gate, line-counter, multiline mode, maxLineLength truncation, maxMatchesPerFile, transferable-buffer match packing. - **New:** `script/text-import-plugin.ts` — tiny esbuild plugin (~20 LOC) that polyfills Bun's `with { type: "text" }` attribute by reading the file and emitting it as a default-export string. Shared by `script/build.ts` (CLI binary) and `script/bundle.ts` (npm library). - **Modified:** `src/lib/scan/worker-pool.ts` imports the worker body as `import grepWorkerSource from "./grep-worker.js" with { type: "text" }`. TypeScript's type system doesn't model the attribute, so we cast through `unknown` to a string. The runtime Blob constructor would throw if given a non-string, so the cast is guarded. - **Modified:** `script/build.ts`, `script/bundle.ts` — add the plugin to their esbuild plugin arrays. - **Deleted:** `src/lib/scan/grep-worker-source.ts`. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean (1 pre-existing markdown warning). Biome now lints `grep-worker.js`; several issues surfaced during the refactor (`noIncrementDecrement`, `useBlockStatements`, `useTemplate`) were all fixed in the same commit. - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5640 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `SENTRY_CLIENT_ID=placeholder bun run bundle` — library bundle builds successfully, worker source is embedded in `dist/index.cjs` as expected. - [x] `bun run bench --size large --runs 3 --warmup 1` — perf unchanged (`scan.grepFiles` 161ms, `scanCodeForDsns` 241ms).

Pre-merge self-review found a silent correctness regression in the unified DSN pipeline from the previous commit: `grepFiles` defaults to `maxLineLength: 2000`, which truncates `match.line` with a `…` suffix. The DSN scanner then re-runs `DSN_PATTERN` on that truncated line to recover all DSNs on the line — but the pattern ends with `/\d+`, so a DSN near the end of a long line loses its trailing digits to `…` and fails the regex silently. Reproduced pre-fix: const filler = "x".repeat(1990); const dsn = "https://abc@o111222.ingest.us.sentry.io/7654321"; const line = `${filler} const DSN = "${dsn}";`; writeFileSync(join(tmpDir, "minified.js"), line); const result = await scanCodeForDsns(tmpDir); // → `result.dsns.length === 0` ❌ (should be 1) Realistic triggers: minified JS bundles (commonly have lines >2000 chars), source-map-embedded config blobs, base64-encoded envs. Within the walker's 256KB `MAX_FILE_SIZE` cap, a single line carrying a DSN past column 2000 is plausible. Fix: pass `maxLineLength: Number.POSITIVE_INFINITY` from the DSN scanner. Memory is bounded by `MAX_FILE_SIZE` (per-file) and the `seen` dedup map (per-scan), so there's no blow-up risk. Also fixes a stale docstring at the top of `worker-pool.ts` that still described the `.unref()` design from an earlier iteration. The current code explicitly does NOT call `.unref()` (it caused a deadlock) and the old docstring contradicted the implementation. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean - [x] Reproduction above → 0 DSNs (pre-fix) → 1 DSN (post-fix) - [x] New regression test in `test/lib/dsn/code-scanner.test.ts`: "finds DSNs on long single-line files (>2000 chars, common in minified JS)" - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5641 pass, 0 fail** (+1 new regression test) - [x] `bun test test/isolated` — 138 pass - [x] `bun run bench --size large --runs 3 --warmup 1` — no perf regression (`scanCodeForDsns` 238ms, `scan.grepFiles` 163ms)

PR #807 originally removed `.unref()` from each worker at spawn to avoid a mid-pipeline deadlock where unref'd workers weren't processing messages on idle ticks. But keeping workers ref'd broke the CLI exit path: `project view` (and any command that transits `detectAllDsns` → `scanCodeForDsns` → `grepFiles`) would emit its output and then hang indefinitely because ref'd-but-idle workers kept the event loop alive. E2E caught this: (fail) sentry project view > requires org and project [15001.15ms] ^ this test timed out after 15000ms. Expected: 1 Received: 143 (SIGTERM) Bisected cleanly: - `d2305ba1` (workers-only commit): exit=1 on `project view` — clean. - `6ea26f3d` (DSN unification): exit=124 — hang. Unification made `scanCodeForDsns` route through `grepFiles`, which creates the worker pool. Before unification, the DSN path went through `walkFiles + mapFilesConcurrent` directly and never touched the pool, so `project view` never saw workers. ## Fix: ref/unref balancing per dispatch Workers are now `.unref()`'d at spawn (idle pool doesn't hold the loop open), then `.ref()`'d in `dispatch()` before a request is posted, and `.unref()`'d again when the dispatch settles — via the `message` `result` handler, the `error` handler's mass-reject path, the readiness-failure path, and `terminate()`. Each of these paths has a matched unref for every ref so the ref count stays balanced. This gives us both properties we need: - **Clean CLI exit**: when all dispatches settle, workers are idle + unref'd, the event loop drains, and the process exits naturally. No `beforeExit` handler or explicit `terminatePool()` required on the happy path. - **No mid-pipeline deadlock**: ref'd workers during active work means their `message` events fire on the main thread's next tick, not starved by Bun's idle-tick bypass of unref'd handles. Added `refWorker` / `unrefWorker` helper functions for readability and to guard against runtimes where `Worker` doesn't expose the ref/unref methods (Node's DOM Worker shim, hypothetically). ## Manual verification $ cd /tmp/empty-dir $ time sentry project view Error: Could not auto-detect organization and project. ... real 0m0.040s # was 15s timeout before fix exit=1 Ran 10 consecutive invocations — all exit cleanly in <50ms. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5641 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun run bench --size large --runs 3 --warmup 1` — workers still function end-to-end; no perf regression (`scan.grepFiles` 162ms, `scanCodeForDsns` 238ms) - [x] Rebuilt binary, ran `project view` from empty dir 10× — all exit cleanly in <50ms. Pre-fix: 5/5 hit 15s timeout. - [x] `auth status`, `project view` with DSN, etc. — all exit cleanly.

…ted count Two Cursor Bugbot findings caught on the latest force-push. ## 1. `filesCollected` counts matches, not unique files (medium) `grepFiles` emits one `GrepMatch` per matching line, not per file. The scanner was incrementing `filesCollected` on every match, so a file with 3 DSN-matching lines inflated the counter (and the two `dsn.files_collected` / `dsn.files_scanned` telemetry attributes it backed) by 3×. Fix: use `filesSeenForMtime.size` instead. That set was already tracking unique file paths for mtime recording. Rename-free and semantically accurate: "files with at least one validated DSN" — matches rg's `--files-with-matches` semantics. ## 2. Silent data loss when all workers die (medium) The `dispatchBatch` error handler was swallowing dispatch rejections (worker errors, "all workers dead"). If every batch failed, the consumer loop would see zero results and exit normally, and callers like the DSN scanner would get an empty result indistinguishable from "no DSNs found." The DSN cache layer would then persist a false-negative empty result, leaving the scan broken until the 24h TTL expires. Fix: track `dispatchedBatches` / `failedBatches` counters in `grepViaWorkers`. When the consumer loop exits (via the `finally` block that already re-raises `producerError`), throw if every dispatched batch failed. A single failed batch is still acceptable (per-file errors are swallowed by design inside the worker), but a pipeline-wide failure now surfaces instead of masquerading as "empty result." ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5641 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun run bench --size large --runs 3 --warmup 1` — no perf regression (`scan.grepFiles` 159ms, `scanCodeForDsns` 232ms) - [x] Rebuilt binary + `project view` from empty dir — exits cleanly (exit 1) in <50ms. 3 consecutive runs all clean. Not adding a regression test for finding #2 — the "all workers die" path requires crashing a real worker, which is hard to reproduce deterministically in a unit test. The change is a defensive guard against an edge case that only shows up when the worker runtime itself is broken (e.g., missing `node:fs`).

Cursor Bugbot caught a real race in the previous commit's per-dispatch ref/unref pattern. `Worker.ref()` and `Worker.unref()` are idempotent booleans — NOT reference-counted. Calling `unref()` releases the event-loop hold entirely regardless of how many prior `ref()` calls were made. Previous commit: every `dispatch()` called `refWorker`, every completion called `unrefWorker`. When a single worker had 2+ concurrent dispatches in flight (normal with round-robin batching), the first completion's `unref()` call unrefed the worker even though other dispatches were still running. If the walker had also finished by that point, the event loop would exit with pending worker results dropped silently. Fix: track inflight count per worker (already exists as `pw.inflight`) and only unref when it drops to zero. Specifically: - **`message` result handler:** decrement inflight; if it hit zero, unref. Other dispatches still pending → keep ref'd. - **`error` handler:** drain all pending rejects, then unref ONCE at the end (worker is gone either way). - **readiness-failure path in `dispatch()`:** decrement inflight; if it hit zero, unref (matches `message` handler logic). - **`terminate()`:** same as error handler — reject all pending, then unref once. The spawn-time unref stays (inflight starts at 0, worker is idle). The per-dispatch `ref()` is safe to call unconditionally (idempotent no-op when already ref'd), so no change there. Updated the ref/unref strategy docstring to call out the boolean semantics explicitly, so the next person to touch this code doesn't assume reference-counting. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5641 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun run bench --size large --runs 3 --warmup 1` — workers still parallelize correctly (`scan.grepFiles` 160ms, `scanCodeForDsns` 237ms). If the unref race were being hit, workers would silently starve and numbers would regress to fallback-path territory (~320ms). - [x] Rebuilt binary, ran `project view` from empty dir 5× — all exit cleanly in <50ms.

Seer caught a telemetry inconsistency from the unification refactor. `scanCodeForDsns` was reporting `dsn.files_scanned = filesSeenForMtime.size` — count of unique files with validated DSNs. `scanCodeForFirstDsn` was reporting `dsn.files_scanned = filesScanned` — count of every file walked. Same attribute name, two different meanings depending on which path executed. Makes the telemetry unreliable for scanner performance monitoring. Fix: switch `scanCodeForDsns` from `grepFiles` (streaming) to `collectGrep` (buffered), which returns `stats.filesRead` — the count of files the grep pipeline actually read and tested. That's equivalent to walker-yielded files (after the walker's own extension/gitignore filters), matching what `scanCodeForFirstDsn` reports. DSN workloads buffer <10 matches typically (validated DSNs from a source tree), so the streaming-to-buffered switch has no measurable perf impact. The `filesSeenForMtime` set stays — it still dedups mtime writes across multi-DSN-per-file (grep emits one match per line; without the dedup, a file with 3 DSN-containing lines would trigger 3 mtime writes for the same path). Updated the comment to reflect the new role (mtime-write dedup, not telemetry) and pointed the docstrings at `collectGrep` instead of `grepFiles`. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5641 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun run bench --size large --runs 3 --warmup 1` — no perf regression (`scan.grepFiles` 160ms — this bench op uses `collectGrep` so it was already in the buffered path; `scanCodeForDsns` reruns are in the same range as before). - [x] Rebuilt binary, ran `project view` from empty dir — all exit cleanly in <50ms.

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit dff0be1. Configure here.}

Two latent LOW-severity findings caught on the final CI round. ## 1. Seer: failure-check ran before in-flight dispatches settled In the early-exit path (`AbortError`, `maxResults = 0`), the `finally` block's all-batches-failed check could race with in-flight dispatch `.catch()` handlers. Sequence: 1. Consumer emits enough matches, exits loop. 2. `finally` runs, awaits `producer` (which just dispatches). 3. `finally` checks `failedBatches === dispatchedBatches`. 4. In-flight dispatch `.catch()` fires LATER, bumps `failedBatches`. Step 3 reads stale state. If every batch would ultimately fail, we'd silently return an empty result instead of surfacing the pipeline failure. Seer rated this LOW because only `maxResults: 0` triggers the early exit before any dispatch can settle, and no production caller uses that value — but it's a real bug either way and the fix is cheap. Fix: track dispatch promises and `await Promise.allSettled()` on them in the `finally` block before the failure check. `allSettled` rather than `all` because individual rejections are already counted via the dispatch's `.catch()`; we just need them to complete. ## 2. Cursor: readiness-failure used `pop()` not identity-based removal The readiness-failure handler in `dispatch()` was calling `chosen.pending.pop()` to drop "this dispatch's slot" — but with multiple concurrent dispatches queued to the same worker, `pop()` returns whichever slot happened to be at the END of the queue, not the one belonging to the failing dispatch. Handlers for sibling dispatches would execute in whatever order `ready`'s rejection chain fired, and `pop()` would reject them in reverse queue order. Currently unreachable because `chosen.ready` never rejects in the current pool impl (the ready promise has no failure path), but latent — any future addition of a startup timeout or init failure would expose the bug. Fix: hold a direct reference to `ourSlot` in each dispatch's closure. The readiness-failure handler finds its slot via `indexOf(ourSlot)` and splices that specific entry, preserving FIFO semantics for sibling dispatches. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5641 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun run bench --size large --runs 3 --warmup 1` — no perf regression (`scan.grepFiles` 160ms) - [x] Rebuilt binary, ran `project view` from empty dir 3× — all exit cleanly.

## Summary Follow-up to #791. Replaces the init-wizard's `rg → git grep → fs walk` fallback chain in `src/lib/init/tools/grep.ts` + `glob.ts` with thin adapters over the pure-TS `collectGrep` / `collectGlob` helpers from `src/lib/scan/`. The Mastra wire contract is preserved byte-identical on all existing fields. Two optional fields (`caseInsensitive`, `multiline`) are added to `GrepSearch` for future-proofing — no current server invocation sets them. ## What changed ### Adapters (net −392 LOC of production code) | File | Before | After | |---|---:|---:| | `src/lib/init/tools/grep.ts` | 337 LOC | 114 LOC | | `src/lib/init/tools/glob.ts` | 147 LOC | 73 LOC | | `src/lib/init/tools/search-utils.ts` | 146 LOC | **deleted** | | `src/lib/init/types.ts` | — | +17 LOC (two optional fields) | The adapters now just: 1. Sandbox the user-supplied `search.path` via `safePath`. 2. Forward each search/pattern to `collectGrep` / `collectGlob` with the wire-level constants (`maxResults`, `maxLineLength`) plumbed through. 3. Strip `absolutePath` from each `GrepMatch` — the Mastra wire has never included it. 4. Catch `ValidationError` from bad regex so a single bad pattern surfaces as an empty per-search row rather than aborting the whole payload. ### New optional `GrepSearch` fields ```ts export type GrepSearch = { pattern: string; path?: string; include?: string; caseInsensitive?: boolean; // NEW multiline?: boolean; // NEW }; ``` No current Mastra server invocation sets these. Adding them now means the server can start sending them without a CLI update. The underlying scan engine natively supports both. ### Tests - **Deleted 3 obsolete tests** that shadowed `rg` in `PATH` to force the fallback chain. With the pure-TS adapter there's no fallback chain to exercise; the tests had become tautological. - **Deleted scaffolding**: `writeExecutable`, `setPath`, `helperBinDir`, `savedPath` — only used by the deleted tests. - **Kept 3 pre-existing tests** unchanged: include filters, multi-pattern glob, sandbox rejection. - **Added 3 new adapter-specific tests**: - `grep result matches MUST NOT include absolutePath` — pins the strip behavior so `absolutePath` never leaks to the Mastra agent. - `grep bad regex yields empty matches without crashing the payload` — documents the `ValidationError` catch contract. - `grep caseInsensitive flag enables case-insensitive matching` — end-to-end coverage for the new wire field. ## Behavior changes (intentional, only affects users without `rg`/`git`) Before this PR, the init-wizard fs-walk fallback was naive: no `.gitignore` handling, narrow skip list, no binary detection. Users with `rg` or `git` installed never took this path. After this PR, **every user gets rg-like behavior** via the pure-TS scanner: - **Nested `.gitignore`** honored (cumulative semantics, matching git + rg). - **Wider skip-dir list** — scan excludes `.next`, `target`, `vendor`, `coverage`, `.cache`, `.turbo` in addition to the old skip set. - **Binary files filtered** — scan runs an 8 KB NUL-byte sniff before grep-ing. Users with `rg` installed see no behavior change relative to the rg-path of the old adapter. ## Benchmarks Measured after rebasing onto `main` (includes #791, #804, #805, #807 — the full grep/worker-pool stack). **Fixture:** 10k files (~80 MB), 3 monorepo packages, mix of text + binary. **Config:** 5 runs after 2 warmup, pre-warmed page cache. **Machine:** linux/x64, 4 CPUs, Bun 1.3.11. ripgrep 14.1.0. **Harness:** `/tmp/bench-init-tools.ts` — invokes the adapter via the real `grep(GrepPayload)` entry point Mastra uses. ### Adapter vs rg — single search | Pattern | rg | NEW uncapped | NEW cap=100 | uncapped vs rg | cap vs rg | |---|---:|---:|---:|---:|---:| | `import.*from` (215k matches) | 284 ms | 769 ms | **28.4 ms** | 2.70× slower | **10× FASTER** | | `function\s+\w+` (216k) | 306 ms | 782 ms | **28.2 ms** | 2.56× slower | **11× FASTER** | | `SENTRY_DSN` (677) | 81 ms | 496 ms | 79.3 ms | 6.10× slower | parity | | no matches | 74 ms | 486 ms | 492 ms | 6.6× slower | 6.7× slower | The init-wizard workload is always capped at 100. On dense-match patterns (the common case — Mastra greps for common code patterns like imports and function signatures) the worker pool's literal-prefilter + concurrent fan-out + early-exit hit 100 matches in ~28 ms, **10× faster than rg**. On rare/no-match patterns we're walker-bound at ~500 ms — still fine for init wizard. ### Adapter multi-search (realistic Mastra workload) Payload: 3 patterns (`import.*from`, `SENTRY_DSN`, `function\s+\w+`) in parallel via `Promise.all`, `maxResultsPerSearch: 100`. | Metric | Value | |---|---:| | p50 end-to-end | **109 ms** | All three searches share the same worker pool (dispatched concurrently). ### Scan-harness numbers (context) From `bun run bench --size large`, showing that the new adapter is routed through the worker-backed grep pipeline: | Op | p50 | |---|---:| | `scan.grepFiles` | 166 ms | | `scanCodeForDsns` | 232 ms | | `detectDsn.cold` | 4.78 ms | ## Why ship this Init wizard grep has one caller: the Mastra server, which always sends `maxResults: 100`. For that workload the NEW adapter is **10× faster than rg** on common dense patterns and sidesteps the subprocess-dependency problem: - Users without `rg` no longer fall back to a naive walk that ignores gitignore and scans binaries. - No spawning, no stderr draining, no exit-code decoding, no parse-out-pipe-delimited-output. - `GrepSearch` gains `caseInsensitive` + `multiline` — passthrough to the scan engine's native support. The correctness tax (nested `.gitignore` respected, wider skip list, binaries filtered) is paid once per scan regardless of match density. On the init wizard's capped workload it's noise; on exhaustive scans it's ~500 ms on 10k files vs rg's ~300 ms. ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean (1 pre-existing warning in `src/lib/formatters/markdown.ts:281`, not touched by this PR) - [x] `bun test test/lib/init/ test/lib/scan/` — **352 pass, 0 fail** - [x] Rebased onto latest main (includes #791, #804, #805, #807) - [x] `bun run /tmp/bench-init-tools.ts` — numbers in table above ## What this PR does NOT change - `GrepPayload` / `GlobPayload` — unchanged structurally, only `GrepSearch` gains optional fields - `src/lib/init/tools/registry.ts` — tool dispatch unchanged - `src/lib/init/tools/shared.ts` — `safePath` + `validateToolSandbox` unchanged - Response shape — `{ ok: true, data: { results: [{pattern, matches, truncated}] } }` identical 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

## Summary Cleanup pass over the `scan/` and `dsn/` modules after the grep + worker-pool stack (#791, #804, #805, #807, #797) landed. Removed comment bloat accumulated across the 6+ review cycles those PRs went through — redundant bug-history narration, repeated explanations of `ref/unref` boolean semantics, "pre-PR-N" references, and other scars that wouldn't survive a fresh-eyes read. **Net −708 LOC across 12 files. No behavior changes.** ## Per-file reductions | File | Before | After | Δ | |---|---:|---:|---:| | `src/lib/scan/worker-pool.ts` | 466 | 312 | −33% | | `src/lib/scan/grep.ts` | 985 | 712 | −28% | | `src/lib/dsn/code-scanner.ts` | 541 | 377 | −30% | | `src/lib/scan/grep-worker.js` | 153 | 114 | −25% | | `src/lib/dsn/scan-options.ts` | 70 | 52 | −26% | | `src/lib/init/tools/grep.ts` | 122 | 98 | −20% | | `src/lib/init/tools/glob.ts` | 74 | 59 | −20% | Plus minor trims in `types.ts`, `walker.ts`, `path-utils.ts`, `scan/glob.ts`, and `script/text-import-plugin.ts`. ## What was removed - Redundant explanations of `Worker.ref()` / `.unref()` boolean semantics (stated 3× in `worker-pool.ts` — kept once, on the primary ref/unref helper pair). - Multi-paragraph "earlier iteration did X, deadlocked, so we now do Y" histories — kept in git log where they belong. - "pre-PR-3" / "pre-refactor" / "previous version" narration that explained how code used to look before the current session's work. - File-header docstrings that repeated what the module structure and exports already told you. - Rationale comments for `biome-ignore`s that were already justified by adjacent context. ## What was kept - Every `biome-ignore` comment (all still justified). - Every "why" comment tied to a real gotcha (e.g., the length-change warning on case-insensitive literal prefilters, the `/g` flag cloning rationale, the pipeline-failure detector explanation). - Every JSDoc on exported functions and types. - All inline comments that explain non-obvious constraints (sandbox enforcement, cache-contract stability, etc.). ## Test plan - [x] `bunx tsc --noEmit` — clean - [x] `bun run lint` — clean (1 pre-existing markdown warning) - [x] `bun test --timeout 15000 test/lib test/commands test/types` — **5641 pass, 0 fail** - [x] `bun test test/isolated` — 138 pass - [x] `bun run bench --size large --runs 3` — no perf regression (`scan.grepFiles` 167ms, `scanCodeForDsns` 232ms — matching pre-trim numbers) - [x] `bun run build --single` — binary builds and exits cleanly on `sentry project view` from empty dir (3 consecutive runs, all exit=1) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cursor Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread src/lib/scan/worker-pool.ts

sentry Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread src/lib/dsn/code-scanner.ts Outdated

cursor Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread src/lib/dsn/code-scanner.ts Outdated

Comment thread src/lib/scan/worker-pool.ts Outdated

BYK added 2 commits April 21, 2026 23:24

cursor Bot reviewed Apr 21, 2026

View reviewed changes

Comment thread src/lib/scan/grep.ts

sentry Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread src/lib/scan/worker-pool.ts Outdated

cursor Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread src/lib/scan/worker-pool.ts

sentry Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread src/lib/dsn/code-scanner.ts

sentry Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread src/lib/scan/grep.ts

cursor Bot reviewed Apr 22, 2026

View reviewed changes

Comment thread src/lib/scan/worker-pool.ts

BYK merged commit 35d4d2d into main Apr 22, 2026
26 checks passed

BYK deleted the byk/perf/grep-workers branch April 22, 2026 08:34

BYK mentioned this pull request Apr 22, 2026

refactor(init): migrate grep/glob tools to src/lib/scan/ #797

Merged

5 tasks

BYK mentioned this pull request Apr 22, 2026

chore(scan,dsn): trim session cruft from comment-heavy files #810

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(scan): parallel grep via worker pool with binary-transferable matches#807

perf(scan): parallel grep via worker pool with binary-transferable matches#807
BYK merged 9 commits intomainfrom
byk/perf/grep-workers

BYK commented Apr 21, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 21, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-22 08:16 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

github-actions Bot commented Apr 21, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

BYK commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Perf (synthetic/large, 10k files, p50)

Full-scan comparison vs rg (for context)

Architecture

Worker pool — src/lib/scan/worker-pool.ts

Worker source — src/lib/scan/grep-worker.js + script/text-import-plugin.ts

Pipeline — src/lib/scan/grep.ts

Binary match encoding

DSN scanner unification

Bugs caught + fixed during implementation

Correctness

Test plan

Commits (squashed)

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-04-22 08:16 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

github-actions Bot commented Apr 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Results 📊

📊 Comparison with Base Branch

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BYK commented Apr 21, 2026 •

edited

Loading

Worker pool — `src/lib/scan/worker-pool.ts`

Worker source — `src/lib/scan/grep-worker.js` + `script/text-import-plugin.ts`

Pipeline — `src/lib/scan/grep.ts`

github-actions Bot commented Apr 21, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-04-22 08:16 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

github-actions Bot commented Apr 21, 2026 •

edited

Loading