Skip to content

Re-bootstrap flashblock state per block instead of carrying it forward#9

Merged
sduchesneau merged 1 commit into
firehose/1.xfrom
fix/flashblock-carried-state-divergence
Jun 10, 2026
Merged

Re-bootstrap flashblock state per block instead of carrying it forward#9
sduchesneau merged 1 commit into
firehose/1.xfrom
fix/flashblock-carried-state-divergence

Conversation

@sduchesneau

Copy link
Copy Markdown

Problem

The firehose flashblocks processor carried the revm State across block boundaries on the sequential fast path. The reused State's read cache could drift from its validated post-state bundle, so the next block executed on a subtly-wrong starting state and produced a wrong state root — surfacing as intermittent canonical contradicts recomputed in-flight tip (and is_final hash mismatch on next-base transition) resets.

It was node-specific and non-deterministic: whether a given block was carried-forward vs freshly bootstrapped/replayed depends on each node's canonical-commit timing, and the bug only manifests on the carried path.

Diagnosis

A temporary diagnostic pass (now removed) established, on every divergence:

  • bootstrapped=false, replayed=false — always a carried, incrementally-executed block; never a fresh/replayed one.
  • Execution itself was correct: receipts_root, transactions_root, gas_used, logs_bloom all matched canonical — only the post-state root diverged, broadly (dozens of storage slots across WETH/AMM contracts), the signature of a wrong starting state cascading through execution.
  • Recomputing the root against a fresh parent provider gave the same (wrong) value → the bundle was wrong, not the baseline; the carried starting state was the cause.

Fix

Drop the carried State at each block transition (accumulated_db = None), so every block re-bootstraps from the canonical parent (or buffers pending → replays when the parent isn't committed yet). The speculative state-root precompute already refetches a fresh provider, so is_final latency is unaffected.

Also documents why execute_flashblock discards the BlockExecutionResult (the per-flashblock executor restarts receipt cumulative_gas_used at 0, which nothing on the wire/state path consumes).

The carry-forward tests now mark the parent block available, exercising the re-bootstrap path.

Testing

35 crate tests pass; clippy and the node binary build clean. Confirmed in production: the resets stopped.

Follow-up (not in this PR)

Optionally bootstrap via state_by_block_hash(parent_hash) (served by the in-memory tree pre-commit) so blocks never fall to pending→replay, preserving per-flashblock emission granularity.

The firehose flashblocks processor carried the revm `State` across block
boundaries on the sequential fast path. The reused `State`'s read cache could
drift from its validated post-state bundle, so the next block executed on a
subtly-wrong starting state and produced a wrong state root — surfacing as
intermittent "canonical contradicts recomputed in-flight tip" resets. It was
node-specific and non-deterministic because whether a given block was carried
vs freshly bootstrapped/replayed depended on each node's canonical-commit
timing; execution itself was correct (receipts/gas/logs matched canonical),
only the carried starting state was wrong.

Drop the carried `State` at each block transition so every block re-bootstraps
from the canonical parent (or buffers pending → replays when the parent isn't
committed yet). Mark parent blocks available in the carry-forward tests, which
now exercise the re-bootstrap path.
@sduchesneau

Copy link
Copy Markdown
Author

@maoueh

I think this is the best choice, as it removes tons of UNDO events

empirically:

  • instead of failed blocks every ~~10 block, I see absolutely no failure over >10 minutes.
  • latency seems ~identical

@sduchesneau sduchesneau merged commit 5f75bc3 into firehose/1.x Jun 10, 2026
2 checks passed
@sduchesneau sduchesneau deleted the fix/flashblock-carried-state-divergence branch June 10, 2026 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants