Skip to content

fix(crypto): CRP-2979: Guard SKS old-file zeroing against shared inodes#10233

Draft
mbjorkqvist wants to merge 5 commits into
masterfrom
mathias/CRP-2979-sks-zeroing-on-filesystems-that-reuse-inodes
Draft

fix(crypto): CRP-2979: Guard SKS old-file zeroing against shared inodes#10233
mbjorkqvist wants to merge 5 commits into
masterfrom
mathias/CRP-2979-sks-zeroing-on-filesystems-that-reuse-inodes

Conversation

@mbjorkqvist
Copy link
Copy Markdown
Contributor

Fix a data-loss bug in ProtoSecretKeyStore where the write path unconditionally zeroed sks_data.pb.old after every write, without checking whether it still shared an inode with the just-written sks_data.pb.

On filesystems where rename(2) atomically swaps the destination directory entry to a new inode (Linux ext4/xfs/btrfs/tmpfs/…) the bug was latent: .old always pointed to the pre-rename inode, so zeroing scrubbed old plaintext as intended. On filesystems where rename(2) reuses the destination inode — observed with virtiofs bind mounts via Docker Desktop on macOS — the post-rename .old shared the inode of the new keystore, and zeroing corrupted the file just written. The symptom was every replica panicking at startup with "error parsing SKS protobuf data" after the keystore came back truncated to the trailing map entry.

The startup-cleanup path (clean_up_old_sks) already had the inode guard via are_hard_links_to_the_same_inode; the write path (write_secret_keys_to_disk_and_cleanup_old_file) did not. This PR extracts the guard into a shared helper zeroize_or_unlink_old_file and wires it into both paths. When .old and proto_file share an inode, the helper just unlinks the extra link — the new content has already been written into that inode, so there is no separate old-inode copy left to scrub and the secure-erase invariant is preserved.

mbjorkqvist and others added 3 commits May 15, 2026 11:03
write_secret_keys_to_disk_and_cleanup_old_file unconditionally zeroed
sks_data.pb.old after every write. On filesystems where rename(2) reuses
the destination inode rather than atomically swapping the directory entry
(observed with virtiofs bind mounts on Docker Desktop / macOS), the
post-rename .old hard link shares an inode with the just-written
sks_data.pb, and the zeroing corrupts the live keystore — causing every
replica to panic at startup with "error parsing SKS protobuf data".

Extract the inode-aware "zero or unlink" logic that clean_up_old_sks
already had into a shared helper and call it from both the startup
cleanup path and the write path. When the paths share an inode, just
unlink the extra link: the new content has already been written into
that inode, so there is no separate old-inode copy left to scrub.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The two tests that pre-create `.old` as a hard link to `proto_file` before
calling `insert()` were named/commented as if they exercised the
inode-sharing case. They don't: the cleanup error they assert on comes
from the hard-link-creation step failing because `.old` already exists,
not from any inode comparison. Rename them to reflect what they actually
verify (stale `.old` left behind by a crashed prior process) and point
the comments at `zeroize_or_unlink_old_file`'s unit tests for the
post-rename inode-sharing coverage.

Also extend the metrics test with an on-disk integrity assertion: reopen
the keystore and confirm the inserted key survived. Pre-fix, on
filesystems where rename(2) reuses the destination inode, the reopen
would have panicked with "error parsing SKS protobuf data" — so this
assertion is the end-to-end regression check.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ircuit

The helper's `try_exists` short-circuit on missing `.old` is essential —
without it, the write path on a fresh keystore would call
`are_hard_links_to_the_same_inode` on a non-existent path and surface a
spurious `NotFound` cleanup error (a regression an earlier iteration of
the fix had). Add a direct unit test so a future refactor that drops the
short-circuit fails loudly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mbjorkqvist mbjorkqvist requested a review from Copilot May 15, 2026 11:32
@github-actions github-actions Bot added the fix label May 15, 2026
@mbjorkqvist mbjorkqvist added the CI_ALL_BAZEL_TARGETS Runs all bazel targets label May 15, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a data-loss bug in ProtoSecretKeyStore where post-write cleanup could zeroize the newly written keystore on filesystems that reuse the destination inode during rename(2) (e.g., virtiofs bind mounts), by guarding cleanup against shared inodes and unlinking instead of zeroing when appropriate.

Changes:

  • Extracted shared cleanup logic into zeroize_or_unlink_old_file, which unlinks .old when it shares an inode with the live proto_file, otherwise zeroizes and deletes as before.
  • Wired the helper into both startup cleanup (clean_up_old_sks) and the write path (write_secret_keys_to_disk_and_cleanup_old_file).
  • Added/updated unit tests covering: shared-inode unlink behavior, missing .old no-op, and differing-inode zeroize+delete behavior; adjusted metrics/logging tests accordingly.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
rs/crypto/internal/crypto_service_provider/src/secret_key_store/proto_store.rs Adds inode-aware helper and uses it in both startup cleanup and post-write cleanup to prevent keystore corruption on inode-reuse filesystems.
rs/crypto/internal/crypto_service_provider/src/secret_key_store/proto_store/tests.rs Adds regression/unit tests for the new helper behavior and updates existing metrics/logging tests to reflect the new cleanup semantics.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread rs/crypto/internal/crypto_service_provider/src/secret_key_store/proto_store.rs Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI_ALL_BAZEL_TARGETS Runs all bazel targets fix

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants