Skip to content

fix(sandbox): add idle-timeout backstop to microsandbox exec stream#442

Open
arimxyer wants to merge 1 commit into
vercel:mainfrom
arimxyer:fix/sandbox-microsandbox-exec-idle-timeout
Open

fix(sandbox): add idle-timeout backstop to microsandbox exec stream#442
arimxyer wants to merge 1 commit into
vercel:mainfrom
arimxyer:fix/sandbox-microsandbox-exec-idle-timeout

Conversation

@arimxyer

Copy link
Copy Markdown

Problem

A microsandbox sandbox command can hang indefinitely. In
adaptMicrosandboxExecToSandboxProcess
(packages/eve/src/execution/sandbox/bindings/microsandbox-process.ts) the
completion loop consumes the SDK exec async-iterator:

const result =
  exitCode === undefined
    ? await iterator.next()                                   // no timeout
    : await nextWithTimeout(iterator, MICROSANDBOX_EXEC_POST_EXIT_DRAIN_MS);

While no exited event has arrived (exitCode === undefined), it does a plain
await iterator.next() with no timeout. The 100 ms nextWithTimeout only
applies after an exit event, to drain trailing output. There is a guard for the
stream ending without an exit ("Microsandbox command ended without an exit event.") but none for the stream stalling open — if the iterator never yields
exited and never closes, this await (and the finished promise behind
wait()) blocks forever. The microsandbox SDK exec layer wraps a native NAPI
binding with no timeout of its own, so there is no timeout anywhere in the JS
stack and a stalled exec becomes an infinite hang of the agent turn / eve eval.

Closes #440.

Fix (self-contained backstop)

This is the graduated fix #1 from the issue: an idle timeout on the no-exit
branch, with no public-API change.

  • The pre-exit await iterator.next() now goes through nextWithTimeout with an
    idle deadline. Because each loop iteration starts a fresh race, the deadline
    resets on every stdout/stderr/exit event — a command that keeps emitting
    output is never killed; only total silence trips it.
  • On idle-timeout the command is killed and the finished promise is rejected
    (and the stdout/stderr controllers errored) with a clear error:
    "Microsandbox command exceeded idle timeout (<N>ms with no output or exit event).", so a wedged exec surfaces as a failure instead of hanging.
  • kill() is fire-and-forget (void command.kill().catch(() => {})),
    matching the existing cancellation path in microsandbox-runtime.ts. The
    premise is that the native binding stalled; kill() calls into that same
    binding, so awaiting it could wedge again. The rejection must not depend on
    kill completing.
  • The post-exit 100 ms drain behavior is unchanged.

The terminal branching in the finally block was refactored to a single
terminalError variable so the idle-timeout error, the iterator-threw error, and
the pre-existing "ended without an exit event" error all surface through one
path (first error wins; ReadableStreamDefaultController.error() is a no-op once
the stream is no longer readable).

Design tradeoff and default

A pure wall-clock timeout would wrongly kill long-but-progressing commands; an
idle timeout (reset on output) is better but still risks killing a legitimate
long compute that emits no output for the whole window (e.g. sleep 600, a
silent heavy calculation). So the ceiling is a named constant with a generous
5-minute default
and an override knob.

A false kill (terminating a legit silent compute) is worse than waiting a few
extra minutes to kill a truly-wedged exec, so the default biases generous. Five
minutes of zero bytes — no stdout, no stderr, no exit — is well beyond any
normal tool command, while still bounding the previously-unbounded hang. Tune it
per environment via the idleTimeoutMs option or the
EVE_MICROSANDBOX_EXEC_IDLE_TIMEOUT_MS env var (a malformed value falls back to
the default rather than disabling the backstop).

Out of scope (follow-up)

The issue's fix #2 — threading a cancellation AbortSignal end-to-end from the
tool/turn context through executeBashOnSandboxrunWithDevelopmentSandboxProgress
sandbox.run({ command, abortSignal }) — touches public API
(SessionContext / callback context) and belongs in a separate change. It would
also make agent-level cancellation and eval timeoutMs actually terminate a
running command. This PR is the self-contained backstop only.

Test

packages/eve/src/execution/sandbox/bindings/microsandbox-process.test.ts builds
a fake async-iterable exec handle with a kill() spy and covers:

  1. Happy path — stdout then {kind:"exited", code:0}wait() resolves
    {exitCode:0}, stdout delivers data, kill() not called.
  2. Stall — one stdout then never yields again → with a test-shortened 50 ms
    idle timeout, wait() rejects with the idle-timeout error and kill() was
    called once. (This also exercises the idleTimeoutMs override knob.)
  3. Ends without exit — stream closes before an exit event → the existing
    "ended without an exit event" error still fires (guards the finally refactor).
pnpm --filter eve exec vitest run --config vitest.unit.config.ts \
  src/execution/sandbox/bindings/microsandbox-process.test.ts
# Test Files  1 passed (1)   Tests  3 passed (3)

pnpm --filter eve run typecheck passes.

Signed-off-by: Ari Mayer <ari111097@gmail.com>
@vercel

vercel Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

@arimxyer is attempting to deploy a commit to the Vercel Team on Vercel.

A member of the Team first needs to authorize it.

@arimxyer arimxyer marked this pull request as ready for review June 30, 2026 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

microsandbox sandbox exec can hang indefinitely — no per-command timeout/abort on the exec stream

1 participant