fix(optimize): fix stale OCC read_version in distributed compaction to prevent row resurrection#6653
Open
xiaguanglei wants to merge 1 commit intolance-format:mainfrom
Conversation
…ead_version in distributed compaction In distributed compaction (e.g. Spark's OptimizeExec), the commit phase opens a fresh Dataset handle at the latest version (V+N), which is newer than the version V at which tasks were planned and executed. Using dataset.manifest.version (= V+N) as the TransactionBuilder read_version causes the OCC conflict checker to scan only transactions after V+N, silently skipping any concurrent DELETE/UPDATE that committed between V and V+N. This allows deleted rows to reappear in the compacted output (data resurrection). Fix: derive read_version from min(task.read_version) across all completed tasks, anchoring the OCC window to V so conflicts in [V, V+N] are caught.
Contributor
|
ACTION NEEDED The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification. For details on the error please inspect the "PR Title Check" action. |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
In distributed compaction (e.g. Spark's
OptimizeExec), the commit phase opens a freshDatasethandle at the latest versionV+N, which is newer than versionVat which tasks were planned and executed.commit_compactionwas usingdataset.manifest.version(=V+N) as theTransactionBuilderread_version. Lance uses Optimistic Concurrency Control (OCC): when committing a transaction, the conflict checker loads all transactions committed afterread_versionand checks whether any of them touch the same fragments. UsingV+Nasread_versionnarrows the OCC scan window to(V+N, ∞), silently skipping any concurrentDELETE/UPDATEthat committed betweenVandV+N. The compacted fragment is written without those deletion markers, causing deleted rows to reappear (data resurrection).Root Cause
Fix
Each
RewriteResultalready carriesread_version = V(the version at which the executor ran the task). Derive the transactionread_versionfrommin(task.read_version)so the OCC window is anchored atV, covering the full range[V, ∞).Test Plan
1. Rust unit test (added in this PR)
rust/lance/src/dataset/optimize.rs—test_distributed_compact_concurrent_delete_no_resurrectionSimulates the exact distributed scenario in a single-process test:
Run locally:
cargo test -p lance --lib \ dataset::optimize::tests::test_distributed_compact_concurrent_delete_no_resurrection2. End-to-end reproduction on a real Spark YARN cluster
Two independent
spark-submitapplications against a 10 M-row, 32-fragment Lance table on YARN:OPTIMIZE TABLEDELETE WHERE id % 312500 < 100Result without fix:
Result with fix:
Notes
Datasetfor the commit phase (e.g.OptimizeExec.scalain lance-spark). Single-processcompact_filesis unaffected because the sameDatasethandle is used throughout.RewriteResult.read_versionwas already populated correctly by each task; this fix simply threads it through toTransactionBuilder.Retryable commit conflicterror instead of silently resurrecting deleted rows.