Skip to content

Add composefs-ostree and some basic CLI tools #144

Open
alexlarsson wants to merge 4 commits into
composefs:mainfrom
alexlarsson:ostree-support
Open

Add composefs-ostree and some basic CLI tools #144
alexlarsson wants to merge 4 commits into
composefs:mainfrom
alexlarsson:ostree-support

Conversation

@alexlarsson

Copy link
Copy Markdown
Contributor

Based on ideas from #141

This is an initial version of ostree support. This allows pulling
from local and remote ostree repos, which will create a set of
regular file content objects, as well as a blob containing all the
remaining ostree objects. From the blob we can create an image.

When pulling a commit, a base blob (i.e. "the previous version" can be
specified. Any objects in that base blob will not be downloaded. If a
name is given for the pulled commit, then pre-existing blobs with the
same name will automatically be used as a base blob.

This is an initial version and there are several things missing:

  • Pull operations are completely serial
  • There is no support for ostree summary files
  • There is no support for ostree delta files
  • There is no caching of local file availability (other than base blob)
  • Local ostree repos only support archive mode

@alexlarsson alexlarsson force-pushed the ostree-support branch 2 times, most recently from e0e827f to 9c5b086 Compare June 17, 2025 06:54

@allisonkarlitskaya allisonkarlitskaya left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love this! Thanks for working on it!

I made some comments on the first round of commits. Feel free to adjust those and PR them separately: we can merge those now without further discussion.

The blobs thing is going to need a call.

I didn't review the crate addition in any detail at all. That's probably also going to need a call :)

Comment thread crates/composefs/src/repository.rs
Comment thread crates/composefs/src/repository.rs Outdated
Comment thread crates/composefs/src/repository.rs Outdated
Comment thread crates/composefs/src/repository.rs Outdated
Comment thread crates/composefs/src/repository.rs Outdated
Comment thread crates/composefs/src/lib.rs Outdated
Comment thread crates/cfsctl/Cargo.toml Outdated
Comment thread crates/composefs/src/repository.rs Outdated
@alexlarsson

Copy link
Copy Markdown
Contributor Author

Hmmm, thinking more about this. We probably want a "content type" magic thing in the splitstream header as well, so we can error out if the wrapped thing is of the wrong type.

@alexlarsson alexlarsson force-pushed the ostree-support branch 2 times, most recently from 2ed83a2 to c041afe Compare June 19, 2025 09:11
@alexlarsson

Copy link
Copy Markdown
Contributor Author

Ok. Reworked this to use splitstreams for object maps and commits. And, by using an object mapping to find the object map we make the content of the splitstream for the commit be just the commit data, and thus the sha256 of that splitstream matches the ostree commit id.

@alexlarsson

Copy link
Copy Markdown
Contributor Author

@allisonkarlitskaya There is still lots to do here. But have a look at this approach and see what you think.

@alexlarsson

Copy link
Copy Markdown
Contributor Author

Added some further changes. We now validate all objects when pulling and all non-file objects when creating images. Its hard to efficiently validate file objects during create-image though, we would like to avoid re-reading the external object files to compute the sha256.

Remaining things to do:

  • Stream larger objects into repo
  • Support summaries and summary branches for remote repos
  • Support deltas when remote pulling
  • Parallelize downloads of objects
  • Report pull progress in some sane way
  • Use some kind of local cache for available objects other than just those from "previous version"
  • Handle GPG validation of commit objects

@alexlarsson alexlarsson force-pushed the ostree-support branch 4 times, most recently from 481e604 to e88573d Compare June 30, 2025 14:26
@alexlarsson

Copy link
Copy Markdown
Contributor Author

I started working on the delta support, but it failed because of an issue in gvariant-rs.

@allisonkarlitskaya allisonkarlitskaya left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It occurs to me that it might be interesting not to sort the table of fs-verity references, and it might also be interesting to permit duplicate items.

On the topic of deferring writing of objects to a background thread, this would allow us to write "external object #123" based on a sequential index to the splitstream without actually knowing the hash value yet, and then fill in the actual values in the header at the end when we're writing: it helps there that the fs-verity references aren't compressed and therefore not part of the stream...

@cgwalters

Copy link
Copy Markdown
Collaborator

It seems like we should get in the splitstream changes in 0f6d69e at least sooner rather than later? Can you file a separate PR?

alexlarsson added a commit to alexlarsson/composefs-rs that referenced this pull request Sep 29, 2025
This changes the splitstream format a bit, with the goal of allowing
splitstreams to support ostree files as well (see composefs#144)

The primary differences are:

 * The header is not compressed
 * All referenced fs-verity objects are stored in the header, including
   external chunks, mapped splitstreams and (a new feature) references
   that are not used in chunks.
 * The mapping table is separate from the reference table (and generally
   smaller), and indexes into it.
 * There is a magic value to detect the file format.
 * There is a magic content type to detect the type wrapped in the stream.
 * We store a tag for what ObjectID format is used
 * The total size of the stream is stored in the header.

The ability to reference file objects in the repo even if they are not
part of the splitstream "content" will be useful for the ostree
support to reference file content objects.

This change also allows more efficient GC enumeration, because we
don't have to parse the entire splitstream to find the referenced
objects.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
alexlarsson added a commit to alexlarsson/composefs-rs that referenced this pull request Sep 29, 2025
This changes the splitstream format a bit, with the goal of allowing
splitstreams to support ostree files as well (see composefs#144)

The primary differences are:

 * The header is not compressed
 * All referenced fs-verity objects are stored in the header, including
   external chunks, mapped splitstreams and (a new feature) references
   that are not used in chunks.
 * The mapping table is separate from the reference table (and generally
   smaller), and indexes into it.
 * There is a magic value to detect the file format.
 * There is a magic content type to detect the type wrapped in the stream.
 * We store a tag for what ObjectID format is used
 * The total size of the stream is stored in the header.

The ability to reference file objects in the repo even if they are not
part of the splitstream "content" will be useful for the ostree
support to reference file content objects.

This change also allows more efficient GC enumeration, because we
don't have to parse the entire splitstream to find the referenced
objects.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
alexlarsson added a commit to alexlarsson/composefs-rs that referenced this pull request Oct 6, 2025
This changes the splitstream format a bit, with the goal of allowing
splitstreams to support ostree files as well (see composefs#144)

The primary differences are:

 * The header is not compressed
 * All referenced fs-verity objects are stored in the header, including
   external chunks, mapped splitstreams and (a new feature) references
   that are not used in chunks.
 * The mapping table is separate from the reference table (and generally
   smaller), and indexes into it.
 * There is a magic value to detect the file format.
 * There is a magic content type to detect the type wrapped in the stream.
 * We store a tag for what ObjectID format is used
 * The total size of the stream is stored in the header.

The ability to reference file objects in the repo even if they are not
part of the splitstream "content" will be useful for the ostree
support to reference file content objects.

This change also allows more efficient GC enumeration, because we
don't have to parse the entire splitstream to find the referenced
objects.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
@alexlarsson alexlarsson force-pushed the ostree-support branch 2 times, most recently from c788da2 to 2ee193a Compare October 6, 2025 14:58
alexlarsson added a commit to alexlarsson/composefs-rs that referenced this pull request Oct 6, 2025
This changes the splitstream format a bit, with the goal of allowing
splitstreams to support ostree files as well (see composefs#144)

The primary differences are:

 * The header is not compressed
 * All referenced fs-verity objects are stored in the header, including
   external chunks, mapped splitstreams and (a new feature) references
   that are not used in chunks.
 * The mapping table is separate from the reference table (and generally
   smaller), and indexes into it.
 * There is a magic value to detect the file format.
 * There is a magic content type to detect the type wrapped in the stream.
 * We store a tag for what ObjectID format is used
 * The total size of the stream is stored in the header.

The ability to reference file objects in the repo even if they are not
part of the splitstream "content" will be useful for the ostree
support to reference file content objects.

This change also allows more efficient GC enumeration, because we
don't have to parse the entire splitstream to find the referenced
objects.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
allisonkarlitskaya pushed a commit to allisonkarlitskaya/composefs-rs that referenced this pull request Nov 12, 2025
This changes the splitstream format a bit, with the goal of allowing
splitstreams to support ostree files as well (see composefs#144)

The primary differences are:

 * The header is not compressed
 * All referenced fs-verity objects are stored in the header, including
   external chunks, mapped splitstreams and (a new feature) references
   that are not used in chunks.
 * The mapping table is separate from the reference table (and generally
   smaller), and indexes into it.
 * There is a magic value to detect the file format.
 * There is a magic content type to detect the type wrapped in the stream.
 * We store a tag for what ObjectID format is used
 * The total size of the stream is stored in the header.

The ability to reference file objects in the repo even if they are not
part of the splitstream "content" will be useful for the ostree
support to reference file content objects.

This change also allows more efficient GC enumeration, because we
don't have to parse the entire splitstream to find the referenced
objects.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
@cgwalters

Copy link
Copy Markdown
Collaborator

I also think the fact that it has nothing to do with OCI is great.

I think unless we prove out that composefs can be a very good way to store OCI, then it is not worth investing in. Thankfully that's not the case - I think it is (and I believe you do too!).

So it's not that it has "nothing to do with OCI" (right?) - how about "has the capability to easily/natively store any type of content that one would want to represent as read-only immutable versioned filesystem trees".

For example, today Android as far as I know uses fsverity on single zip files, and they've made it work quite well, but it's harder to get deduplication across apps that way, and maybe someday they go to a composefs-like model.

@alexlarsson

Copy link
Copy Markdown
Contributor Author

I rebased this, lets see if CI passes now.

@allisonkarlitskaya

Copy link
Copy Markdown
Collaborator

I also think the fact that it has nothing to do with OCI is great.

I think unless we prove out that composefs can be a very good way to store OCI, then it is not worth investing in. Thankfully that's not the case - I think it is (and I believe you do too!).

So it's not that it has "nothing to do with OCI" (right?) - how about "has the capability to easily/natively store any type of content that one would want to represent as read-only immutable versioned filesystem trees".

Just to be clear, when I said "it has nothing to do with OCI" I specifically meant composefs-ostree, not composefs-rs generally (which very clearly was designed with OCI in mind).

Very obviously the main target of composefs-rs right now is bootc (OCI), probably followed by container storage (obviously also OCI). flatpak is probably a distant third at the moment, and indeed, even that has something to do with OCI (the current flatpak demo only works with OCI, in fact)...

For example, today Android as far as I know uses fsverity on single zip files, and they've made it work quite well, but it's harder to get deduplication across apps that way, and maybe someday they go to a composefs-like model.

Ya, that's sort of what I meant... it would be cool to show that you can really do a lot of different things with this stuff...

@alexlarsson alexlarsson force-pushed the ostree-support branch 2 times, most recently from 1b98032 to 515fb7f Compare January 29, 2026 16:10

@cgwalters cgwalters left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an initial pass

Comment thread crates/composefs-ostree/src/pull.rs Outdated
Comment thread crates/composefs-ostree/src/pull.rs
Comment thread crates/composefs-ostree/src/pull.rs Outdated
Comment thread crates/composefs-ostree/src/repo.rs Outdated
Comment thread crates/composefs-ostree/src/repo.rs Outdated
Comment thread crates/composefs-ostree/src/repo.rs Outdated
Comment thread crates/composefs-ostree/src/repo.rs Outdated
Comment thread crates/composefs-ostree/src/repo.rs
Comment thread crates/composefs-ostree/src/repo.rs Outdated
if filetype.is_symlink() {
Ok((zlib_header, Box::new(empty())))
} else {
let fd_path = format!("/proc/self/fd/{}", path_fd.as_fd().as_raw_fd());

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tangential to this but I'd like to use https://docs.rs/crate/rustix-linux-procfs/latest I think

@alexlarsson

Copy link
Copy Markdown
Contributor Author

I rebased this and fixes some comments. Still some work to do though.

@alexlarsson alexlarsson force-pushed the ostree-support branch 2 times, most recently from 5fba232 to 1228e9b Compare June 2, 2026 15:49
Signed-off-by: Alexander Larsson <alexl@redhat.com>
This lets you look up a ref digest from the splitstream by index
and is needed by the ostree code.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
This is basically ensure_object_from_fd(), but for anything
implementing Read. basically ensure_object_from_fd() is reimplemented
based on this.

We will need this in the ostree support code for streaming a zlib
compressed file to the repo.

Signed-off-by: Alexander Larsson <alexl@redhat.com>
@alexlarsson

Copy link
Copy Markdown
Contributor Author

Ok, i updated this to the latest version and added streaming creation of repo files and parallelized fetching. Plus some other cleanups.

@alexlarsson

Copy link
Copy Markdown
Contributor Author

Ok, I sent some time on this, its now much more like the "cfsctl oci" commands and behavior, and it does parallel fetches. I also added various integration tests. I think this is pretty complete for what it does (i.e. imports ostree commits into composefs and lets you mount it).

There are some TODOs for summary and delta support, but those are not necessarily super important for the basic functionallity.

@alexlarsson alexlarsson force-pushed the ostree-support branch 7 times, most recently from 188640d to efba46d Compare June 17, 2026 16:20
ref ostree_ref,
base_name,
} => {
eprintln!("Fetching {ostree_ref}");

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't log via eprintln! we have the progress API now.

Also on that topic...I think we should expose a varlink API for this now, right?

I guess neither of these need to strictly block merging though.


🤔 I guess actually...if we go down this varlink path, perhaps in theory we could have both the oci and ostree fetchers be extension binaries i.e. something like /usr/libexec/composefs/ext/oci is automatically cfsctl oci? That could be interesting...and would actually force us to have a good "core" varlink api.

Comment thread crates/composefs-ostree/src/commit.rs Outdated
Comment on lines +325 to +327
for i in 1..256 {
// Bucket ends are (non-strictly) increasing
if buckets[i] < buckets[i - 1] {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general in Rust many array accesses can be done more elegantly and more safely than just direct indexing. In this specific case I think https://doc.rust-lang.org/stable/std/primitive.slice.html#method.array_windows is what we want

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array_windows is unstable though, do we really want to use that?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used regular .windows() instead. Also, I spent some time in general rustifying the code and cleaning it up.

// until the queue is drained and all in-flight fetches have completed.
let mut join_set: JoinSet<Result<FetchResult<ObjectID>>> = JoinSet::new();

loop {

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can interleave metadata and data fetches, it's what libostree does. Is it worth the added complexity? Maybe not.

@alexlarsson alexlarsson Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not. This thing is actually surprisingly fast as is:

$ time target/debug/cfsctl --repo repo ostree pull https://dl.flathub.org/repo runtime/org.gnome.Platform/x86_64/50
Fetching runtime/org.gnome.Platform/x86_64/50
█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 16288/16288commit  f6fb972824514aefc06b23d7d591192c9cba2ad72648bf0473d06a565c40c264
verity  a2e16123b2310d65b6b886e89e4fc45ec61c47efd0b4daac05bd3132ac3d8f78890fcbb22d0975411917742d17e86ad52709c8ec1ff33a62b97e94f38528242a
image   079ade7aba4e5fef51b30a80e3d1805fd5f79d9a48f240340c7a94f98497d8265edee857563308c00574a1bab792168caa0152af04ef47ac15d8a935ca291e15
tagged  runtime/org.gnome.Platform/x86_64/50
objects 2752 metadata + 16288 files fetched

real	0m13,006s
user	0m12,066s
sys	0m1,663s
$ du -csh repo
1,1G	repo
1,1G	total

Comment thread crates/composefs-ostree/src/commit.rs Outdated
*
* Commit splitstreams are mappings from a set of ostree sha256
* digests into the content for that ostree object. The content is
* defined as some data, and an optional ObjectID referencing an

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this thing twice and didn't fully understand it. "The content is defined as some data" is vague 😄


Our goal is conceptually to define a serialization of an ostree commit into a single "stream", right? And then splitting out content objects as externals.

Hmm...why wouldn't it work to basically do what we do with tar, walk the commit in a depth-first manner, serializing metadata + externals as we go?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the "some data" is just generally the ostree object data for the digest.
We don't just want to have a serialization, because we also want to use the commit as a way to efficiently look up ostree objects in the commit. We use this during pull to avoid pulling objects that was in the previous version of the commit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, its more like a hash table from sha256 digests to objects that are optionally external object ids.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And its not actually "inline data OR external refs", we sometimes have to have both, because we need to store metadata as well. So, more like we store the archive-z2 file header in the data, and then content in the external ref.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't just want to have a serialization, because we also want to use the commit as a way to efficiently look up ostree objects in the commit.

Would parsing all of the metadata be expensive though?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, would it be impossible to change it to something else? For sure not, but what is the point? Allison and I spent a fair amount of time creating the new splitstream format specifically for this use. So it is the format we have, and its intended to be efficient for what we use it for.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, fair! But can you spend some tokens clarifying the docs a bit at least?

I get the efficiency idea, but one thing that seems odd to me right now is that because we store this ostree-specific thing in the split stream content, it ends up zstd compressed. So we're at least reading the whole thing into RAM, we can't mmap etc.

With putting tar in split stream, this all made sense because we basically don't look at the tar stream unless we're copying the image out.

Also, while I get that it was nontrivial to design the format, there's also the traditional "cost of maintenance > cost of writing" to consider. Splitstream is a good bit of complexity on its own, but I think it's turned out mostly OK because for the OCI case it basically is a wrapper for a very well known thing - tar (ok well tar is a mess too, but it's a well-known mess). This work here is combining split stream with two entirely different more bespoke formats (splitstream-ostree and ostree).

I guess one way I'd say this is if you have a data format, it should have the ability to be converted to JSON, have a "structure checker" like fsck etc.

I'm aware I ~lost this argument before but e.g. https://cbor.io is pretty widely used. Does pulling in cbor for just this secondary bespoke binary format have the right cost/benefit? Perhaps not. (But, since we already need it: why not gvariant?)


Dunno. This is a discussion, nothing I am saying here is blocking.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the docs to be be more readable, comprehensive and documenting the final/current state of things. And, I agree that having them zstd compressed does make it a bit weird for this to claim "efficiency", although I sort of agree with Allisons more modern view of mmap and its problems.

That said, I fundamentally think a "bucket of sha256 indexed objects" is the more correct format for an ostree thing. Serializing an ostree commit tree just feels wrong. Like, would you then duplicate things that were shared in many places (like dirmetas, or hardlinks)?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the docs to be be more readable, comprehensive and documenting the final/current state of things.

Thanks.

That said, I fundamentally think a "bucket of sha256 indexed objects" is the more correct format for an ostree thing. Serializing an ostree commit tree just feels wrong. Like, would you then duplicate things that were shared in many places (like dirmetas, or hardlinks)?

No, I think the obvious flattened serialization would just have "each object is emitted once" semantics. Hardlinks are implicit in the ostree format - the data doesn't have st_nlink.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added doc/ostree.md which has a more detailed documentation on the format, including some general faffing about ostree and how this is supposed to be used.

@cgwalters

Copy link
Copy Markdown
Collaborator

@allisonkarlitskaya You have a "changes requested" here which blocks merges

Based on ideas from composefs#141

This is an initial version of ostree support. This allows pulling from
local and remote ostree repos, which will create a set of regular file
content objects, as well as a commit splitstream containing all the
remaining ostree objects and file data. From the splitstream we can
create an image.

When pulling a commit, base commits (i.e. "the previous version" can
be specified, either manually and/or added automatically based on
parent commit or previous commit for the pulled ref. Any objects in
that base commit will not be downloaded.

Commits are splitstreams named ostree-commit-xxxx, and refs that
points to these are refs/ostree/$ref.

erofs images are automatically created for pulled commits, and they
can be mounted with "cfsctl ostree mount". There are also some other
subcommands, that are simliar to those of oci:
 * dump
 * compute-id
 * inspect
 * tag
 * untag
 * images

Signed-off-by: Alexander Larsson <alexl@redhat.com>
Assisted-by: Claude Code (Opus 4.6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants