Skip to content

Add state machine for image switch (milestone 4c)#50

Merged
jlebon merged 10 commits into
bootc-dev:mainfrom
alicefr:milestone-4c
Jun 19, 2026
Merged

Add state machine for image switch (milestone 4c)#50
jlebon merged 10 commits into
bootc-dev:mainfrom
alicefr:milestone-4c

Conversation

@alicefr

@alicefr alicefr commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator

Implements the daemon-side state machine that detects image mismatches between spec.desiredImage and the booted image,
stages updates via bootc switch in a background goroutine, and triggers reboot when desiredImageState == Booted.

  • Add Switch(ctx, image, apply) to the Executor interface
  • Rewrite the reconciler with async staging, cancel-on-spec-change, and single-patch-per-reconcile
  • Background goroutine signals completion via source.Channel, letting the reconcile loop handle all status updates through
    one path
  • Apply operations (reboot) are never cancelled — the daemon restarts after reboot and reconciles fresh

@alicefr

alicefr commented Jun 3, 2026

Copy link
Copy Markdown
Collaborator Author

@jlebon I skipped the e2e test since we still need to define how to test the upgrade. The state machine is only checked by the env tests for now

@alicefr alicefr marked this pull request as draft June 3, 2026 09:52
@alicefr alicefr marked this pull request as ready for review June 3, 2026 11:50
Comment thread internal/daemon/fake_test.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/bootc/executor.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler_test.go Outdated
@alicefr alicefr force-pushed the milestone-4c branch 4 times, most recently from 7e1ff20 to 2afd447 Compare June 10, 2026 12:36
@alicefr alicefr marked this pull request as draft June 11, 2026 13:49
@alicefr

alicefr commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

There are still some issue with this PR... investigating.

@alicefr alicefr force-pushed the milestone-4c branch 2 times, most recently from 56119dc to ba87f0d Compare June 11, 2026 14:21
@alicefr alicefr marked this pull request as ready for review June 11, 2026 15:14
@alicefr

alicefr commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator Author

I think we should prioritize #57 and implement at least the happy path test for the reboot

@alicefr alicefr force-pushed the milestone-4c branch 2 times, most recently from e4557ce to 413cf0d Compare June 12, 2026 11:47
@alicefr alicefr force-pushed the milestone-4c branch 2 times, most recently from 1226849 to ed7b12d Compare June 16, 2026 07:23
Comment thread internal/bootc/executor.go
Comment thread config/daemon/daemon.yaml
Comment thread internal/daemon/fake_test.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go
@jlebon

jlebon commented Jun 17, 2026

Copy link
Copy Markdown
Collaborator

I haven't re-reviewed past the 2nd commit yet (tests).

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
@alicefr alicefr force-pushed the milestone-4c branch 2 times, most recently from e76449d to 7fdea08 Compare June 17, 2026 14:42
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
@alicefr alicefr force-pushed the milestone-4c branch 2 times, most recently from 9dc4817 to 2a136b0 Compare June 18, 2026 11:53

@jlebon jlebon left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Full review this time.

This is getting quite close! Very cool to see it all tested in the e2e test.

Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread internal/daemon/reconciler.go Outdated
Comment thread test/e2e/e2eutil/env.go Outdated
Comment thread config/daemon/daemon.yaml
Comment thread test/e2e/e2eutil/env.go Outdated
Comment thread test/e2e/bootcnode_test.go Outdated
Comment thread test/e2e/bootcnode_test.go Outdated
alicefr added 2 commits June 19, 2026 13:24
Rewrite the reconciler to detect image mismatches between
spec.desiredImage and the booted image, stage via bootc switch in a
background goroutine.

Once, it finished to staged the image, the termination of the
goroutine triggers once more the reconciliation loop which will detect
that the system requires a reboot.

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
Replace raw JSON bytes with a bootc.Status struct in the test fake.
Status() serializes the struct via json.Marshal, and Stage()
auto-mutates the status (staging sets Staged). Reboot() records the
call for test assertions.

Add newBootcStatus() and newBootEntry() helpers to build test state
without verbose JSON constants.

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
alicefr added 3 commits June 19, 2026 11:45
Replace the bootcStatusFull JSON constant with newBootcStatus()
struct construction. Tighten the error assertion to match the
exact error chain.

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
Add envtest cases for the daemon reconciler state machine:

- TestStagingTriggered: image mismatch triggers bootc stage
- TestStagingError: stage failure sets Degraded condition
- TestAlreadyStaged: skip stage when image already staged
- TestRebootingSet: reboot triggered when desiredImageState is Booted
- TestRollback: restage when desired image changes
- TestCancelInflightStage: spec change cancels in-flight stage

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
alicefr added 3 commits June 19, 2026 13:19
The TestUpdateReboot verifies that the upgrades to the new image is
successfully performed. It starts by patching the desiredImage to a new
one. After the reboot, the node should be Idle with the new booted
image and schedulable, proving that the uncordon was successful.

Additionally, the test verifies that the image is the one we built for
upgrades by checking the existance of the file /usr/share/update-marker.

We don't assert on intermediate states (Staging, Rebooting) because
they are too transient to catch reliably with polling — the daemon
restarts after the reboot so the Rebooting window is brief and a
spurious reconcile during shutdown can overwrite it before the next
poll.

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
The daemon runs bootc switch which downloads the images. This operations
requires consumes additional memory limits, otherwise it gets OOM
killed.

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
TestUpdateReboot fails because the node briefly enters a Degraded
state during the reboot window, but the controller only logged the
state name, not the daemon's error message.

The daemon restarts after the reboot so its previous logs are lost.
Log the Degraded condition's Message field so the next failure
reveals the cause.

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
@alicefr

alicefr commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator Author

There has been some flaky test runs. I have removed the check for the rebooting reason because apparently the bootcnode is sometimes set to degraded if during reboot the bootc status doesn't report correctly the status. It was hard to detect it because the daemon restarts and we loose the logs while the controller was reporting only degraded without the reason and the message.

Although, the code is correct, even if the bootcnode is set to degraded for some time, it then recovers since the test failure was reporting healthy condition. Example of a run can be found here.

The e2e test verifies now that the end state is the desired one and we successfully managed to upgrade without checking intermediate conditions.

The test testcontrollermembership fails with the daemon not able pulling
from the registry. This has already been reported in bink in:
   bootc-dev/bink#59
and fixed by:
  bootc-dev/bink#60

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Alice Frosi <afrosi@redhat.com>
@jlebon

jlebon commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

There has been some flaky test runs. I have removed the check for the rebooting reason because apparently the bootcnode is sometimes set to degraded if during reboot the bootc status doesn't report correctly the status.

Nice. Yeah this would be fixed by #50 (comment).

@jlebon jlebon left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, nice work! 🎉

Comment thread internal/daemon/reconciler_test.go
@jlebon jlebon merged commit 8f9a5e9 into bootc-dev:main Jun 19, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants