fix: address audit findings in force_rebalance code paths#1357
Open
SLoeuillet wants to merge 4 commits intoAltinity:masterfrom
Open
fix: address audit findings in force_rebalance code paths#1357SLoeuillet wants to merge 4 commits intoAltinity:masterfrom
SLoeuillet wants to merge 4 commits intoAltinity:masterfrom
Conversation
Fixes several bugs found during audit of PR Altinity#1351 force_rebalance logic: 1. RebalancedFiles not persisted to metadata (CRITICAL) TableMetadata.Save() was not copying the RebalancedFiles map, causing it to be lost on resumable downloads when metadata is re-read from disk. Added RebalancedFiles to the saved fields. 2. Archive file name trimming used wrong disk prefix (CRITICAL) In downloadTableData() compressed format path, partName was computed with TrimPrefix(archiveFile, capturedDisk+"_"). But archive files are named with the ORIGINAL disk prefix (e.g. "default_part1.tar.gz"), not the rebalanced disk name. The trim silently failed, breaking the hardlink-exists-files optimization. Use disk (original) for the trim. 3. Loop variable mutation in downloadDiffParts (CRITICAL) `disk` was mutated to `part.RebalancedDisk`, then used later as `capturedDisk := disk` to update `table.Parts[capturedDisk][idx]`. The idx came from iterating table.Parts[original_disk], so the update could go out of bounds or corrupt the wrong entry when the rebalanced and original disk names differed. Use activeDisk per iteration like we already do in filesystemhelper. 4. Unclear error message for store path construction Split the check into two distinct error messages so it's clear whether the disk is missing from diskMap or the UUID is empty. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ix/force-rebalance-audit-bugs
Collaborator
|
your fixes stuck somewhere =(
it means kill after timeout |
…ix/force-rebalance-audit-bugs
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #1351. A post-merge audit of the force_rebalance code found 4 bugs that weren't caught during review. This PR fixes them.
Bugs fixed
1. RebalancedFiles not persisted to metadata (CRITICAL)
pkg/metadata/table_metadata.go—TableMetadata.Save()was copyingFiles,Parts,Checksums,Size,TotalBytesbut notRebalancedFiles. This caused the rebalancing map to be lost when metadata was re-read from disk during resumable downloads, potentially causing archive files to be downloaded to wrong disks after resume.2. Archive file name trimming used wrong disk prefix (CRITICAL)
pkg/backup/download.go:703— In the compressed format download path:Archive files are named with the original disk prefix (e.g.
default_part1.tar.gz), not the rebalanced disk name. WhencapturedDiskwashdd1andarchiveFilewasdefault_part1.tar.gz, theTrimPrefixsilently failed andpartNamecontained the disk prefix. This broke the--hardlink-exists-filesoptimization (part lookup always failed). Fixed to usedisk(original) for the trim.3. Loop variable mutation in
downloadDiffParts(CRITICAL)pkg/backup/download.go:978— The outer loop variablediskwas mutated:idxcame from iteratingtable.Parts[original_disk], so whencapturedDiskwas different (rebalanced), the index could be out of bounds fortable.Parts[capturedDisk], causing panics or silent corruption of unrelated parts. Fixed by using a per-iterationactiveDiskvariable — same pattern we already applied infilesystemhelper.gofor the same bug class.4. Unclear error message for store path construction
pkg/filesystemhelper/filesystemhelper.go— The errorcan't build store path for rebalanced disk X (uuid=Y)didn't distinguish between a missing disk and an empty UUID. Split into two specific error messages for easier debugging.Test plan
go buildpassesgo vetclean with-tags=integrationTestForceRebalancepasses locally on ClickHouse 24.8 with race detector (82s)🤖 Generated with Claude Code