oci: Add varlink APIs using "splitdirfdstream" by cgwalters · Pull Request #309 · composefs/composefs-rs

cgwalters · 2026-06-05T16:20:06Z

First, I discovered that actually fd-passing with varlink generally works well, and I was misguided in thinking we needed jsonrpc-fdpass.

Almost: one issue is that varlink doesn't have good support for passing a lot of file descriptors (which jsonrpc-fdpass was designed to handle).

But upon some reflection, I realized we don't need to pass a file descriptor per file, all use cases here are fine with a directory fd plus filename.

So here a new data stream format "splitdirfdstream" is implemented.

We first now use that internally when we're doing a direct pull from containers-storage for reflinking/hardlinkling.

But better: let's expose that data concept over varlink, where a varlink client can both pull or push container image layers that way.

This paves the way to a very clear mechanism for us to integrate with containers-storage or other storage stacks (like containerd) in an agnostic way.

We also now support cfsctl oci copy to copy across composefs repositories which is also implemented this way.

Generated-by: OpenCode (Claude Opus 4.8)

giuseppe · 2026-06-05T16:32:51Z

how will this work with podman-container-tools/container-libs#651 ?

I don't think the container-tools are going to add again varlink as a dependency for this use case only

cgwalters · 2026-06-05T17:42:22Z

One thing to bear in mind is there's two levels to this, the splitdirfdstream (replacing splitfdstream) and the higher level IPC mechanism by which that is passed around as representations of OCI layers. Only the latter involves varlink, and we could choose to keep using jsonrpc-fdpass for the latter in container-tools if you prefer.

Sorry about the proposed strategy change, but as I dug in more it just feels right - and I somehow again just missed the fd passing support in the varlink ecosystem. I guess one thing that changed is zlink is relatively new, and is well maintained and good code. (The varlink/rust project went through a messy time)

Would varlink be heavier than jsonrpc-fdpass? Hmm, let me see...I spent some tokens on this

varlink/go vs jsonrpc-fdpass-go: binary size comparison

🤖 Assisted-by: OpenCode (Claude Sonnet 4.6)

Measured against the containers-storage binary built from origin/main with
-ldflags="-s -w" -tags exclude_graphdriver_btrfs. Each branch adds an
equivalent stub service (listen → accept → receive → reply) in
cmd/containers-storage/rpc_bench_stub.go, guarded by a package-level var so
the linker cannot dead-code-eliminate it. Branches: bench/varlink,
bench/jsonrpc-fdpass.

Dependency	Size (KiB)	Δ KiB	Δ %	Δ packages
baseline	8,003.8	—	—	244
+`varlink/go`	8,071.9	+68.1	+0.851%	246
+`jsonrpc-fdpass-go`	8,083.9	+80.1	+1.001%	245

With a realistic service stub, both libraries are similar in weight (~68–80 KiB).
varlink/go is slightly smaller despite pulling in 2 extra packages vs 1,
likely because jsonrpc-fdpass-go depends on golang.org/x/sys for
syscall.RawConn fd passing whereas varlink/go has no external dependencies.

That said, varlink/go#43 needs doing.

cgwalters · 2026-06-05T20:41:32Z

Only the latter involves varlink, and we could choose to keep using jsonrpc-fdpass for the latter in container-tools if you prefer.

Or, it'd probably work to use gRPC for most metadata/controlplane but pass fds over a separate negotiated socket. I don't have a really strong opinion.

One other thing I'd say here that I think is a big cleanup is that our parsing of containers-storage: layouts now always goes through splitdirfdstream(+varlink) as an intermediary - we've added an abstraction layer there that we're now always testing. This helps pave the way much more clearly to plugging in an external binary.

Also the converse is now true - it should now be straightforward for external tooling to push content into composefs-rs in an efficient way.

giuseppe · 2026-06-06T20:09:11Z

That said, varlink/go#43 needs doing.

This is a blocker for the containers/storage integration. I have no preference whether it is varlink or jsonrpc, but should we hold this until we know we can use it in containers/storage too?

cgwalters · 2026-06-08T15:56:31Z

This is a blocker for the containers/storage integration. I have no preference whether it is varlink or jsonrpc, but should we hold this until we know we can use it in containers/storage too?

I generated a PR in varlink/go#44

I think we don't need to block this strictly speaking, it seems in the end better for us to temporarily use a patched varlink/go than to use the custom jsonrpc-fdpass right? Also, if we can take this track to finally replace the current experimental-image-proxy protocol with one thing it'd be overall a large win.

giuseppe · 2026-06-08T16:55:42Z

I think we don't need to block this strictly speaking, it seems in the end better for us to temporarily use a patched varlink/go than to use the custom jsonrpc-fdpass right? Also, if we can take this track to finally replace the current experimental-image-proxy protocol with one thing it'd be overall a large win.

@mheon are you fine with that?

mheon · 2026-06-08T17:08:34Z

Sorry for not following this one closely. Are we talking about doing a hard dependency on Varlink in order to use composefs with c/storage? Do we know what that's going to do to our binary size?

giuseppe · 2026-06-08T17:10:06Z

Sorry for not following this one closely. Are we talking about doing a hard dependency on Varlink in order to use composefs with c/storage? Do we know what that's going to do to our binary size?

yes, to add it as the RPC to communicate between Go and Rust.

@cgwalters made a comparison here:

#309 (comment)

cgwalters · 2026-06-09T12:11:16Z

OK, this one I think is ready for review/merge.

mheon · 2026-06-09T16:23:19Z

I don't think we have a fundamental disagreement with Varlink as an IPC protocol, though I am worried about protocol design. What kind of long term stability are we expecting with this? Are we going to have to do a hard pinning of composefs-rs against c/storage versions to ensure both ends are talking the same version of the protocol?

cgwalters · 2026-06-09T18:51:25Z

Are we going to have to do a hard pinning of composefs-rs against c/storage

Right, this is a confusing topic. At the current time, containers-storage uses /usr/bin/mkcomposefs from https://github.com/composefs/composefs - the C implementation.

We have an effort to fully replace that project with this one, including a new Rust mkcomposefs that has landed here that aims to be 100% compatible.

The trajectory is then to ship the composefs package with this project.

But that's not (directly) related to this PR or the PR Giuseppe is working on, which aim to expose varlink APIs on both ends.

I think the trajectory that would help the most is to replace skopeo --experimental-image-proxy (that's used by bootc/ostree) with a varlink-based interface, that's what's going on in podman-container-tools/container-libs#651 - but that needs updating to match this.

In that flow, it's more the other way around again: bootc/composefs(-rs) would be calling into skopeo ➡️ container-libs via varlink.

…timeout The integration CI jobs were hitting GitHub's 6-hour job limit, which caused silent cancellation rather than a real test failure. Two issues combined to cause this: 1. nextest's integration profile had terminate-after = 60, meaning a single hung test could run for 1200 × 60 = 72 000 s (20 h) before nextest force-killed it, far exceeding the 6-hour GHA limit. 2. The integration job had no explicit timeout-minutes, so a stalled job wasn't surfaced as a clear timeout failure. Fix both: drop terminate-after to 1 (each test gets exactly 20 min before nextest terminates it, which is generous for VM boot + test execution), and add timeout-minutes: 90 to the integration job so any build or runner hang fails cleanly rather than silently burning 6 hours. Assisted-by: OpenCode (Claude Sonnet 4.6) Signed-off-by: Colin Walters <walters@verbum.org>

First, I discovered that actually fd-passing with varlink generally works well, and I was misguided in thinking we needed jsonrpc-fdpass. Almost: one issue is that varlink doesn't have good support for passing *a lot* of file descriptors (which jsonrpc-fdpass was designed to handle). But upon some reflection, I realized we don't need to pass a file descriptor per file, all use cases here are fine with a directory fd plus filename. So here a new data stream format "splitdirfdstream" is implemented. We first now use that *internally* when we're doing a direct pull from containers-storage for reflinking/hardlinkling. But better: let's expose that data concept over varlink, where a varlink client can both pull or push container image layers that way. This paves the way to a very clear mechanism for us to integrate with containers-storage or other storage stacks (like containerd) in an agnostic way. We also now support `cfsctl oci copy` to copy across composefs repositories which is also implemented this way. Generated-by: OpenCode (Claude Opus 4.8) Signed-off-by: Colin Walters <walters@verbum.org>

cgwalters · 2026-06-11T22:24:45Z

OK! I've gone over and cleaned this up more. I think it's ready for a wider review - I've trimmed out some garbage etc. One thing I had my agent do was inject random "dummy" fds into the producer stream to ensure that the consumer side was correctly reading the indices (and not just e.g. hardcoding 0 to access the dirfd), that shook out some bugs.

alexlarsson · 2026-06-15T12:14:00Z

+        /// Use reflink/hardlink zero-copy transfer (requires same filesystem and root).
+        #[clap(long)]
+        zerocopy: bool,
+    },


Can you split out the oci copy cli support into a separate commit?

alexlarsson · 2026-06-15T12:15:20Z

+    ///
+    /// The source repository is selected by the global `--repo`/`--user`/
+    /// `--system` flags. The destination is `--to`. Both repositories must use
+    /// the same hash algorithm.


Is this really the natural approach? I would personally expect --repo to be the destination repo.

alexlarsson · 2026-06-15T12:20:54Z

+    ///
+    /// The source repository is selected by the global `--repo`/`--user`/
+    /// `--system` flags. The destination is `--to`. Both repositories must use
+    /// the same hash algorithm.


This says the algorithm must be the same, but later comments (in the implementation) seems to say it supports conversion.

alexlarsson · 2026-06-15T12:23:10Z

+
+    // Zerocopy (hardlink) requires the same fs-verity algorithm on both sides
+    // because it enables verity in-place on the shared source inode.
+    if zerocopy && std::any::TypeId::of::<SrcID>() != std::any::TypeId::of::<DestID>() {


Do we really always want to hard error here? It seems useful to support a "zerocopy-if-possible." operation.

alexlarsson · 2026-06-15T13:39:11Z

+        paths.push(format!("{home}/.local/share/containers/storage"));
+    }
+
+    paths.push("/var/lib/containers/storage".to_string());


Shouldn't this read storage.conf and get the graphroot and the additional image dirs from there. And also respect the CONTAINERS_STROAGE_CONF env var.

alexlarsson · 2026-06-15T14:04:43Z

+                    stats.bytes_inlined += length;
+                    writer.write_inline(data);
+                } else {
+                    // Large file: store as an external object. A memfd is the


We can easily denial-of-service this by sending a large amount of MAX_INLINE_CHUNK_SIZE inline files, which all will end up in memfds. Do we care?

Hmm. or i guess we can just do one at a time, so maybe this is fine.

alexlarsson · 2026-06-15T14:08:26Z

+                let fd = dir
+                    .open(name)
+                    .map(OwnedFd::from)
+                    .with_context(|| format!("open {name:?} in overlay dir[{dirfd_index}]"))?;


Should we maybe check that this is a regular file?

alexlarsson · 2026-06-15T14:17:08Z

+    let mut stats = ImportStats::default();
+    let mut hasher = Sha256::new();
+
+    while let Some(chunk) = reader.next_chunk().context("splitdirfdstream read error")? {


This duplicates pub fn drain_splitdirfdstream() quite a bit, is it not possible to have a dummy hasher and reuse this code?

alexlarsson · 2026-06-15T15:31:23Z

I did a highlevel pass through this code, but man, there is a lot of code. Also, I think @giuseppe need to review the storage related parts.

cgwalters requested a review from giuseppe June 5, 2026 16:20

cgwalters force-pushed the splitdirfdstream branch from ac7e9f6 to 5ad1961 Compare June 7, 2026 21:13

giuseppe reviewed Jun 8, 2026

View reviewed changes

Comment thread crates/composefs-splitdirfdstream/README.md Outdated

cgwalters force-pushed the splitdirfdstream branch from 5ad1961 to b50f5dd Compare June 8, 2026 19:36

cgwalters mentioned this pull request Jun 8, 2026

storage: add splitfdstream for efficient layer transfer with reflink support podman-container-tools/container-libs#651

Open

cgwalters force-pushed the splitdirfdstream branch 2 times, most recently from a5d52a9 to fb82369 Compare June 9, 2026 11:16

cgwalters requested a review from giuseppe June 9, 2026 12:11

cgwalters enabled auto-merge June 9, 2026 12:46

cgwalters force-pushed the splitdirfdstream branch from fb82369 to ee3f351 Compare June 10, 2026 19:10

cgwalters marked this pull request as draft June 10, 2026 19:24

auto-merge was automatically disabled June 10, 2026 19:24
Pull request was converted to draft

cgwalters force-pushed the splitdirfdstream branch from ee3f351 to b5feb85 Compare June 11, 2026 18:40

cgwalters marked this pull request as ready for review June 11, 2026 21:29

cgwalters added 2 commits June 11, 2026 18:19

cgwalters force-pushed the splitdirfdstream branch from b5feb85 to df2b109 Compare June 11, 2026 22:19

alexlarsson reviewed Jun 15, 2026

View reviewed changes

Comment thread crates/composefs-oci/src/cstor.rs

alexlarsson reviewed Jun 15, 2026

View reviewed changes

Conversation

cgwalters commented Jun 5, 2026

Uh oh!

giuseppe commented Jun 5, 2026

Uh oh!

cgwalters commented Jun 5, 2026

varlink/go vs jsonrpc-fdpass-go: binary size comparison

Uh oh!

cgwalters commented Jun 5, 2026

Uh oh!

giuseppe commented Jun 6, 2026

Uh oh!

Uh oh!

cgwalters commented Jun 8, 2026

Uh oh!

giuseppe commented Jun 8, 2026

Uh oh!

mheon commented Jun 8, 2026

Uh oh!

giuseppe commented Jun 8, 2026

Uh oh!

cgwalters commented Jun 9, 2026

Uh oh!

mheon commented Jun 9, 2026

Uh oh!

cgwalters commented Jun 9, 2026

Uh oh!

cgwalters commented Jun 11, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexlarsson Jun 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alexlarsson commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

alexlarsson Jun 15, 2026 •

edited

Loading