perf: rewrite stage_1 lexer with custom(), optimize stages 2-4#31
Open
NTBBloodbath wants to merge 1 commit into
Open
perf: rewrite stage_1 lexer with custom(), optimize stages 2-4#31NTBBloodbath wants to merge 1 commit into
NTBBloodbath wants to merge 1 commit into
Conversation
Replace chumsky combinator-based lexer with hand-written loop inside custom() escape hatch. Coalesce consecutive Regular chars into Text(String) runs to reduce downstream token count. Eliminate wasted Vec allocations in stage_2 whitespace separators. Remove per-item clone() in stage_3 dedup_opener_candidates; fuse post- rollup unravel+eliminate into single pass. Convert stage_4 to slice-based traversal, eliminating recursive to_vec() allocations. Benchmarks vs pre-migration baseline (chumsky 0.9.3, all LTO): Fixture: small (747B) parse_tree: 2.54ms → 0.25ms (10x) parse_flat: 2.57ms → 0.23ms (11x) stage_1: 1.93ms → 12µs (161x) stage_2: 71µs → 52µs (1.4x) stage_3: 584µs → 162µs (3.6x) stage_4: 24µs → 12µs (2.0x) Fixture: medium (50.8KB, 1108 lines) parse_tree: 151.7ms → 10.6ms (14.4x) parse_flat: 149.5ms → 10.2ms (14.8x) stage_1: 128.5ms → 0.89ms (144x) stage_2: 2.33ms → 1.46ms (1.6x) stage_3: 22.1ms → 7.69ms (2.9x) stage_4: 1.33ms → 0.59ms (2.3x) Fixture: large (6KB, 204 lines) parse_tree: 18.5ms → 1.68ms (11x) parse_flat: 18.0ms → 1.59ms (11x) stage_1: 14.0ms → 100µs (140x) stage_2: 422µs → 282µs (1.5x) stage_3: 3.34ms → 1.17ms (2.9x) stage_4: 256µs → 101µs (2.5x) All 31 tests pass. Zero warnings (build + clippy). Key changes: - src/stage_1.rs: custom() lexer replaces chumsky combinators. Adds NorgToken::Text(String) variant. Coalesces Regular char runs into Text tokens reducing stage_2 input 5-10x. - src/stage_2.rs: Replace wasted whitespace .collect::<Vec<_>>() with .ignored() in separators. Add Text(String) arm to tokens_to_paragraph_segment. - src/stage_3.rs: Replace coalesce() in dedup_opener_candidates with manual fold avoiding per-item 160-byte clones. Fuse post-rollup unravel+eliminate_invalid_candidates into single pass. Remove unused eliminate_invalid_candidates function. - src/stage_4.rs: Introduce stage_4_from(&[NorgASTFlat]) taking slices. consume_heading_content and consume_nestable_* return (content, new_position) tuples instead of mutating &mut usize. Eliminates 6 recursive flat[i..j].to_vec() allocations.
02e2ab1 to
9024eae
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replace chumsky combinator-based lexer with hand-written loop inside chumsky's custom() escape hatch. Coalesce consecutive Regular chars into Text(String) runs to reduce downstream token count. Eliminate wasted Vec allocations in stage_2 whitespace separators. Remove per-item clone() in stage_3 dedup_opener_candidates; fuse post- rollup unravel+eliminate into single pass. Convert stage_4 to slice-based traversal, eliminating recursive to_vec() allocations.
Stacked on top of #30
Benchmarks vs pre-migration baseline (chumsky 0.9.3, all LTO):
Fixture: small (747B)
parse_tree: 2.54ms → 0.25ms (10x)
parse_flat: 2.57ms → 0.23ms (11x)
stage_1: 1.93ms → 12µs (161x)
stage_2: 71µs → 52µs (1.4x)
stage_3: 584µs → 162µs (3.6x)
stage_4: 24µs → 12µs (2.0x)
Fixture: medium (50.8KB, 1108 lines)
parse_tree: 151.7ms → 10.6ms (14.4x)
parse_flat: 149.5ms → 10.2ms (14.8x)
stage_1: 128.5ms → 0.89ms (144x)
stage_2: 2.33ms → 1.46ms (1.6x)
stage_3: 22.1ms → 7.69ms (2.9x)
stage_4: 1.33ms → 0.59ms (2.3x)
Fixture: large (6KB, 204 lines)
parse_tree: 18.5ms → 1.68ms (11x)
parse_flat: 18.0ms → 1.59ms (11x)
stage_1: 14.0ms → 100µs (140x)
stage_2: 422µs → 282µs (1.5x)
stage_3: 3.34ms → 1.17ms (2.9x)
stage_4: 256µs → 101µs (2.5x)
All 31 tests pass. Zero warnings (build + clippy).
Key changes: