Skip to content

perf: rewrite stage_1 lexer with custom(), optimize stages 2-4#31

Open
NTBBloodbath wants to merge 1 commit into
push-tnntnmzouzmsfrom
push-vznuopvolwup
Open

perf: rewrite stage_1 lexer with custom(), optimize stages 2-4#31
NTBBloodbath wants to merge 1 commit into
push-tnntnmzouzmsfrom
push-vznuopvolwup

Conversation

@NTBBloodbath

Copy link
Copy Markdown
Member

Replace chumsky combinator-based lexer with hand-written loop inside chumsky's custom() escape hatch. Coalesce consecutive Regular chars into Text(String) runs to reduce downstream token count. Eliminate wasted Vec allocations in stage_2 whitespace separators. Remove per-item clone() in stage_3 dedup_opener_candidates; fuse post- rollup unravel+eliminate into single pass. Convert stage_4 to slice-based traversal, eliminating recursive to_vec() allocations.

Stacked on top of #30


Benchmarks vs pre-migration baseline (chumsky 0.9.3, all LTO):

Fixture: small (747B)
parse_tree: 2.54ms → 0.25ms (10x)
parse_flat: 2.57ms → 0.23ms (11x)
stage_1: 1.93ms → 12µs (161x)
stage_2: 71µs → 52µs (1.4x)
stage_3: 584µs → 162µs (3.6x)
stage_4: 24µs → 12µs (2.0x)

Fixture: medium (50.8KB, 1108 lines)
parse_tree: 151.7ms → 10.6ms (14.4x)
parse_flat: 149.5ms → 10.2ms (14.8x)
stage_1: 128.5ms → 0.89ms (144x)
stage_2: 2.33ms → 1.46ms (1.6x)
stage_3: 22.1ms → 7.69ms (2.9x)
stage_4: 1.33ms → 0.59ms (2.3x)

Fixture: large (6KB, 204 lines)
parse_tree: 18.5ms → 1.68ms (11x)
parse_flat: 18.0ms → 1.59ms (11x)
stage_1: 14.0ms → 100µs (140x)
stage_2: 422µs → 282µs (1.5x)
stage_3: 3.34ms → 1.17ms (2.9x)
stage_4: 256µs → 101µs (2.5x)

All 31 tests pass. Zero warnings (build + clippy).

Key changes:

  • src/stage_1.rs: custom() lexer replaces chumsky combinators. Adds NorgToken::Text(String) variant. Coalesces Regular char runs into Text tokens reducing stage_2 input 5-10x.
  • src/stage_2.rs: Replace wasted whitespace .collect::<Vec<_>>() with .ignored() in separators. Add Text(String) arm to tokens_to_paragraph_segment.
  • src/stage_3.rs: Replace coalesce() in dedup_opener_candidates with manual fold avoiding per-item 160-byte clones. Fuse post-rollup unravel+eliminate_invalid_candidates into single pass. Remove unused eliminate_invalid_candidates function.
  • src/stage_4.rs: Introduce stage_4_from(&[NorgASTFlat]) taking slices. consume_heading_content and consume_nestable_* return (content, new_position) tuples instead of mutating &mut usize. Eliminates 6 recursive flat[i..j].to_vec() allocations.

Replace chumsky combinator-based lexer with hand-written loop
inside custom() escape hatch. Coalesce consecutive Regular chars
into Text(String) runs to reduce downstream token count. Eliminate
wasted Vec allocations in stage_2 whitespace separators. Remove
per-item clone() in stage_3 dedup_opener_candidates; fuse post-
rollup unravel+eliminate into single pass. Convert stage_4 to
slice-based traversal, eliminating recursive to_vec() allocations.

Benchmarks vs pre-migration baseline (chumsky 0.9.3, all LTO):

Fixture: small (747B)
  parse_tree:    2.54ms → 0.25ms (10x)
  parse_flat:    2.57ms → 0.23ms (11x)
  stage_1:       1.93ms → 12µs   (161x)
  stage_2:       71µs   → 52µs   (1.4x)
  stage_3:       584µs  → 162µs  (3.6x)
  stage_4:       24µs   → 12µs   (2.0x)

Fixture: medium (50.8KB, 1108 lines)
  parse_tree:    151.7ms → 10.6ms (14.4x)
  parse_flat:    149.5ms → 10.2ms (14.8x)
  stage_1:       128.5ms → 0.89ms (144x)
  stage_2:       2.33ms  → 1.46ms (1.6x)
  stage_3:       22.1ms  → 7.69ms (2.9x)
  stage_4:       1.33ms  → 0.59ms (2.3x)

Fixture: large (6KB, 204 lines)
  parse_tree:    18.5ms  → 1.68ms (11x)
  parse_flat:    18.0ms  → 1.59ms (11x)
  stage_1:       14.0ms  → 100µs  (140x)
  stage_2:       422µs   → 282µs  (1.5x)
  stage_3:       3.34ms  → 1.17ms (2.9x)
  stage_4:       256µs   → 101µs  (2.5x)

All 31 tests pass. Zero warnings (build + clippy).

Key changes:
- src/stage_1.rs: custom() lexer replaces chumsky combinators.
  Adds NorgToken::Text(String) variant. Coalesces Regular char
  runs into Text tokens reducing stage_2 input 5-10x.
- src/stage_2.rs: Replace wasted whitespace .collect::<Vec<_>>()
  with .ignored() in separators. Add Text(String) arm to
  tokens_to_paragraph_segment.
- src/stage_3.rs: Replace coalesce() in dedup_opener_candidates
  with manual fold avoiding per-item 160-byte clones. Fuse
  post-rollup unravel+eliminate_invalid_candidates into single
  pass. Remove unused eliminate_invalid_candidates function.
- src/stage_4.rs: Introduce stage_4_from(&[NorgASTFlat]) taking
  slices. consume_heading_content and consume_nestable_* return
  (content, new_position) tuples instead of mutating &mut usize.
  Eliminates 6 recursive flat[i..j].to_vec() allocations.
@NTBBloodbath NTBBloodbath marked this pull request as ready for review June 17, 2026 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant