fix(crypto): CRP-2979: Guard SKS old-file zeroing against shared inodes by mbjorkqvist · Pull Request #10233 · dfinity/ic

mbjorkqvist · 2026-05-15T11:32:13Z

Fix a data-loss bug in ProtoSecretKeyStore where the write path unconditionally zeroed sks_data.pb.old after every write, without checking whether it still shared an inode with the just-written sks_data.pb.

On filesystems where rename(2) atomically swaps the destination directory entry to a new inode (Linux ext4/xfs/btrfs/tmpfs/…) the bug was latent: .old always pointed to the pre-rename inode, so zeroing scrubbed old plaintext as intended. On filesystems where rename(2) reuses the destination inode — observed with virtiofs bind mounts via Docker Desktop on macOS — the post-rename .old shared the inode of the new keystore, and zeroing corrupted the file just written. The symptom was every replica panicking at startup with "error parsing SKS protobuf data" after the keystore came back truncated to the trailing map entry.

The startup-cleanup path (clean_up_old_sks) already had the inode guard via are_hard_links_to_the_same_inode; the write path (write_secret_keys_to_disk_and_cleanup_old_file) did not. This PR extracts the guard into a shared helper zeroize_or_unlink_old_file and wires it into both paths. When .old and proto_file share an inode, the helper just unlinks the extra link — the new content has already been written into that inode, so there is no separate old-inode copy left to scrub and the secure-erase invariant is preserved.

write_secret_keys_to_disk_and_cleanup_old_file unconditionally zeroed sks_data.pb.old after every write. On filesystems where rename(2) reuses the destination inode rather than atomically swapping the directory entry (observed with virtiofs bind mounts on Docker Desktop / macOS), the post-rename .old hard link shares an inode with the just-written sks_data.pb, and the zeroing corrupts the live keystore — causing every replica to panic at startup with "error parsing SKS protobuf data". Extract the inode-aware "zero or unlink" logic that clean_up_old_sks already had into a shared helper and call it from both the startup cleanup path and the write path. When the paths share an inode, just unlink the extra link: the new content has already been written into that inode, so there is no separate old-inode copy left to scrub. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The two tests that pre-create `.old` as a hard link to `proto_file` before calling `insert()` were named/commented as if they exercised the inode-sharing case. They don't: the cleanup error they assert on comes from the hard-link-creation step failing because `.old` already exists, not from any inode comparison. Rename them to reflect what they actually verify (stale `.old` left behind by a crashed prior process) and point the comments at `zeroize_or_unlink_old_file`'s unit tests for the post-rename inode-sharing coverage. Also extend the metrics test with an on-disk integrity assertion: reopen the keystore and confirm the inserted key survived. Pre-fix, on filesystems where rename(2) reuses the destination inode, the reopen would have panicked with "error parsing SKS protobuf data" — so this assertion is the end-to-end regression check. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ircuit The helper's `try_exists` short-circuit on missing `.old` is essential — without it, the write path on a fresh keystore would call `are_hard_links_to_the_same_inode` on a non-existent path and surface a spurious `NotFound` cleanup error (a regression an earlier iteration of the fix had). Add a direct unit test so a future refactor that drops the short-circuit fails loudly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Fixes a data-loss bug in ProtoSecretKeyStore where post-write cleanup could zeroize the newly written keystore on filesystems that reuse the destination inode during rename(2) (e.g., virtiofs bind mounts), by guarding cleanup against shared inodes and unlinking instead of zeroing when appropriate.

Changes:

Extracted shared cleanup logic into zeroize_or_unlink_old_file, which unlinks .old when it shares an inode with the live proto_file, otherwise zeroizes and deletes as before.
Wired the helper into both startup cleanup (clean_up_old_sks) and the write path (write_secret_keys_to_disk_and_cleanup_old_file).
Added/updated unit tests covering: shared-inode unlink behavior, missing .old no-op, and differing-inode zeroize+delete behavior; adjusted metrics/logging tests accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
rs/crypto/internal/crypto_service_provider/src/secret_key_store/proto_store.rs	Adds inode-aware helper and uses it in both startup cleanup and post-write cleanup to prevent keystore corruption on inode-reuse filesystems.
rs/crypto/internal/crypto_service_provider/src/secret_key_store/proto_store/tests.rs	Adds regression/unit tests for the new helper behavior and updates existing metrics/logging tests to reflect the new cleanup semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

mbjorkqvist and others added 3 commits May 15, 2026 11:03

mbjorkqvist requested a review from Copilot May 15, 2026 11:32

github-actions Bot added the fix label May 15, 2026

mbjorkqvist added the CI_ALL_BAZEL_TARGETS Runs all bazel targets label May 15, 2026

Empty commit to trigger CI_ALL_BAZEL_TARGETS

fd924fc

Copilot started reviewing on behalf of mbjorkqvist May 15, 2026 11:32 View session

Copilot AI reviewed May 15, 2026

View reviewed changes

Comment thread rs/crypto/internal/crypto_service_provider/src/secret_key_store/proto_store.rs Outdated

Clarify comments

bb11843

mbjorkqvist requested a review from adamspofford-dfinity May 18, 2026 08:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(crypto): CRP-2979: Guard SKS old-file zeroing against shared inodes#10233

fix(crypto): CRP-2979: Guard SKS old-file zeroing against shared inodes#10233
mbjorkqvist wants to merge 5 commits into
masterfrom
mathias/CRP-2979-sks-zeroing-on-filesystems-that-reuse-inodes

mbjorkqvist commented May 15, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mbjorkqvist commented May 15, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants