Skip to content

Unattended updates using OSTree

The basic OSTree upgrade operation is atomic, meaning the operation either applies an entire update or none at all. OSTree upgrades are never partial. However, upgrades can fail in other ways. A system can fail to boot, or it can boot but not work correctly. When an upgrade fails on a VM or a desktop computer, users can interact with the boot menu to reboot the system and fall back to the last working version of the OS image. However, users cannot interactively reboot the embedded systems used for automotive use cases. Instead, the system must automatically detect failures and fall back to the last working image. This process is known as an unattended update.

Watchdogs and boot-once mechanisms

The basic mechanism to facilitate unattended updates is an external watchdog.

At a high level, a watchdog workflow follows these steps:

  1. Configure your system with an external watchdog.
  2. The watchdog starts a timer.
  3. An update begins.
  4. If the update succeeds and the system boots within a predefined time limit without failures, the watchdog receives the stop command and stops.
  5. If the boot succeeds but some other failure occurs, the system automatically rolls back and reboots into the previous version of the OS image.
  6. If the boot fails, the watchdog cannot see a stop command. When the timer runs out, the watchdog resets the CPU and forces a reboot into the previous version of the OS image.

To use a watchdog, the system must support boot-once functionality. You can use boot-once mechanisms to configure your system to boot into a new version of an OS and, unless that boot succeeds, the next reboot rolls back into the original version of the OS.

Watchdog in a QEMU VM

How you use watchdogs and boot-once mechanisms depends largely on your specific hardware. A single set of instructions that applies to all hardware types does not currently exist. However, you can configure and implement watchdogs and boot-once mechanisms in a QEMU VM for experimentation purposes.

QEMU supports some emulated hardware watchdogs, but they reset the watchdog upon system reboot and are therefore incompatible with unattended updates. However, you can add a simple external watchdog script, /dev/virtio-ports/watchdog.0, by adding the --watchdog option when you run the automotive-image-runner script. Adding the --verbose option enables messages from the watchdog.

Boot-once mechanisms in grub2

OSTree images use grub2 to boot the system, which uses boot loader specification (BLS) files to describe the possible boot targets and supports a boot counter mechanism to trigger the fallback. After an update, OSTree creates BLS files for new and old targets, where the new target is first (the default boot) and the old target is second.

Each time grub boots, it loads the grubenv file, which stores the key/value state between boots. In particular, it supports the boot_counter and boot_success keys. If boot_counter is set, it decrements and saves back to grubenv with each boot. If boot_counter reaches zero, the boot fails, and the second BLS entry becomes the default boot. In this scenario, the update rolls back to the old target.

Updating your system

Greenboot integrates with OSTree and systemd to add various forms of health checks that optimize the watchdog and boot-once mechanisms during updates.

Using Greenboot, the workflow of a typical update follows these steps:

  1. rpm-ostree upgrade stages an update, which writes the basic OS in place for the next boot, but it doesn’t merge the system /etc into the new deployment or configure grub to boot it.
  2. rpm-ostree triggers ostree-finalize-staged.service, which completes the update after the reboot.
  3. greenboot-grub2-set-counter.service modifies grubenv to set boot_counter, enabling the boot-once mechanism and health checks for the new boot.
  4. The system reboots.
  5. Before triggering boot-complete.target in systemd, greenboot-healthcheck.service runs various checks on the system and detects whether the system functions (green) or fails (red).
  6. If the system fails, the system logs the failure information and reboots. The failure triggers the boot_counter mechanism, and the system falls back to the old OSTree deployment. During the next boot, the greenboot-rpm-ostree-grub2-check-fallback.service service detects the fallback and makes the old default system permanent.
  7. If the system succeeds, the greenboot-grub2-set-success.service removes the boot_counter key and sets boot_success=1 in grubenv. Consequent reboots use the new OS version.

The watchdog service files integrate with this workflow in two ways:

  • watchdog-ostree-start.service starts the watchdog before the ostree-finalize-staged.service completes the migration.
  • watchdog-ostree-stop.service starts after boot-complete.target, which indicates that the upgrade was successful, and stops the watchdog.

Prerequisites

  • An OSTree-based image, such as the image that you created in Creating an OSTree-based image

    Note

    For demonstration purposes, the sample manifest file upgrade-demo.mpp.yml is compatible with this procedure because it preconfigures Greenboot and installs and enables watchdog tools and services.

Procedure

  1. Update your image. The example command adds the autosig-sample-slow-startup as an extra RPM that makes the boot time slower than the 30-second watchdog timer, and it updates the version:

    $ automotive-image-builder build --target qemu --mode image --ostree-repo <ostree-repo-name> \
    --define 'version="<X.X>"' --define 'extra_rpms=["<add_extra_rpms>"]' --export qcow2 \
    <path>/<manifest-name>.mpp.yml <image-name>.repo
    

    Note

    The use of --define to modify a build from the command line is an acceptable method in a test environment. In a production environment, however, make changes directly to your manifest file.

    The example command adds the autosig-sample-slow-startup as an extra RPM that makes the boot time slower than the 30-second watchdog timer, and it updates the version:

    $ automotive-image-builder build --target qemu --mode image --ostree-repo ostree-repo \
    --define 'version="1.3"' --define 'extra_rpms=["autosig-sample-slow-startup"]' --export qcow2 \
    images/upgrade-demo.mpp.yml my-image.repo
    

    Using the .repo extension instead of .qcow2 indicates to OSTree that you are updating or iterating on an image rather than creating a new image. The updated image is added to the OSTree repo as a new ref with a unique commit ID.

  2. Run the image:

    $ automotive-image-runner --verbose --watchdog --publish-dir=<ostree-repo-name> <image-name>.qcow2
    

    For example:

    $ automotive-image-runner --verbose --watchdog --publish-dir=ostree-repo my-image.qcow2
    publishing ostree-repo on http://10.0.2.100/
    port: 2222 → 22
    MAC: FE:7a:05:f1:94:85
    Image: my-image.qcow2
    Running: /usr/bin/qemu-system-x86_64 -drive file=/usr/share/OVMF/OVMF_CODE.fd,if=pflash,format=raw,unit=0,readonly=on
    -drive file=/usr/share/OVMF/OVMF_VARS.fd,if=pflash,format=raw,unit=1,snapshot=on,readonly=off -smp 8 -enable-kvm -m 2G -machine q35
    -cpu host -device virtio-net-pci,netdev=n0,mac=FE:7a:05:f1:94:85
    -netdev user,id=n0,net=10.0.2.0/24,guestfwd=tcp:10.0.2.100:80-cmd:netcat 127.0.0.1 46937,hostfwd=tcp::2222-:22 -qmp
    unix:/tmp/runvm-ba27770687aa6dd1tklvorxb/qmp-socket,server=on,wait=off -device virtio-serial -chardev
    socket,path=/tmp/runvm-ba27770687aa6dd1tklvorxb/watch-socket,server=on,wait=off,id=watchdog
    -device virtserialport,chardev=watchdog,name=watchdog.0
    -drive file=my-image.qcow2,index=0,media=disk,format=qcow2,if=virtio,id=rootdisk,snapshot=off
    Stopped watchdog
    

    Note

    Watchdog status messages appear on the terminal command line. To observe watchdog messages, position the VM console so you can see the terminal command line.

  3. After the image boots, log in as root using the password password.

  4. From the VM console, verify the state of the system:

    # rpm-ostree status
    State: idle
    Deployments:
    ● auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
                  Version: 1.2 (2024-11-11T22:21:43Z)
                   Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
    # cat /boot/grub2/grubenv
    kernelopts=root=LABEL=root
    boot_success=1
    # GRUB Environment Block
    boot_success=1
    ...
    
  5. Run rpm-ostree upgrade and verify the state of the system:

    # rpm-ostree upgrade
    Staging deployment... done
    Added:
      autosig-sample-slow-startup-0.1-1.el9.x86_64
    Run "systemctl reboot" to start a reboot
    # rpm-ostree status
    State: idle
    Deployments:
      auto-sig:cs9/x86_64/<target>-<manifest-name>
                  Version: 1.3 (2024-12-05T16:46:21Z)
                   Commit: 500891c082f0232ec520897b2f28db3b349a3e41ee2f03ba18a7ada9b685fcbb
    
    ● auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
                  Version: 1.2 (2024-11-11T22:21:43Z)
                   Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
    # cat /boot/grub2/grubenv
    # GRUB Environment Block
    boot_success=0
    boot_counter=1
    ...
    
  6. Reboot the system to deploy the new version of your image:

    # systemctl reboot
    
  7. On the terminal command line, notice that the watchdog timer starts, which coincides with the system reboot:

    Starting watchdog for 30 sec
    
  8. After the image boots, log in as root using the password password.

  9. Quickly verify the state of the system because the VM rapidly reboots and rolls back to the older image version:

    # rpm-ostree status
    State: idle
    Deployments:
        ● auto-sig:cs9/x86_64/<target>-<manifest-name>
                  Version: 1.3 (2024-12-05T16:46:21Z)
                   Commit: 500891c082f0232ec520897b2f28db3b349a3e41ee2f03ba18a7ada9b685fcbb
    
          auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
                  Version: 1.2 (2024-11-11T22:21:43Z)
                   Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
    
  10. On the terminal command line, notice that the rollback process triggers the watchdog:

    Triggering watchdog
    Stopped watchdog
    
  11. After the image reboots, log in again.

  12. Run journalctl to review journald log messages. Notice the Stop watchdog message after the successful reboot occurs:

    greenboot-rpm-ostree-grub2-check-fallback[561]: FALLBACK BOOT DETECTED! Default rpm-ostree deployment has been rolled back.
    Reached target Boot Completion Check.
    Starting Mark boot as successful in grubenv...
    Starting greenboot Success Scripts Runner...
    greenboot[670]: Boot Status is GREEN - Health Check SUCCESS
    Starting Stop watchdog after update on successful boot...
    Finished greenboot Success Scripts Runner.
    watchdog-ostree-stop.service: Deactivated successfully.
    Finished Stop watchdog after update on successful boot.
    Finished Mark boot as successful in grubenv.
    
  13. Verify the state of the system. Notice the VM rolled back to the previous OS version:

    # rpm-ostree status
    State: idle
    Deployments:
          auto-sig:cs9/x86_64/<target>-<manifest-name>
                  Version: 1.3 (2024-12-05T16:46:21Z)
                   Commit: 500891c082f0232ec520897b2f28db3b349a3e41ee2f03ba18a7ada9b685fcbb
    
          ● auto-sig:410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
                  Version: 1.2 (2024-11-11T22:21:43Z)
                   Commit: 410b97c4ca59df58f33fce3d1e389a3eb8f7c1367c1afacb7c8842576d8daeed
    # cat /boot/grub2/grubenv
    # GRUB Environment Block
    boot_success=1
    ...
    

© Red Hat