Last time I talked about using rpm-ostree-bisect, a tool that I wrote to automatically bisect the history of an OSTree remote in order to find the exact commit when a problem was introduced. I recently put the tool to the test again.
Recently a user reported an issue
where their system was seeing timeouts on boot. They determined
that if they removed the
from the kernel command line then the system would boot without timing
out on the swap device (i.e. an extra 90 seconds added to boot time).
This was interesting because the behavior was actually introduced
during the Fedora 28 cycle (i.e. it was not present in the first
release of Fedora 28 Atomic Host, but showed up sometime over the
life of Fedora 28 Atomic Host). I decided to dig in and do some
investigation. Along the way I realized, admittedly later than I
than I should have, that I could use
rpm-ostree-bisect to help
determine exactly where this problem was introduced.
OK. First off I needed an easy scriptable test to determine if the
problem exists in a particular commit. In this case it was pretty
easy; I just needed to search for
Timed out waiting for device
in the journal for the current boot. My
script ended up looking like:
[root@localhost ~]# cat /usr/local/bin/test.sh #!/bin/bash if journalctl -b0 | grep 'Timed out waiting for device'; then echo "Found device timeout. Test Fails" exit 1 else echo "Did not find device timeout. Test Passes" exit 0 fi
I then grabbed the
rpm-ostree-bisect program from GitHub
and started the bisect. I happened to know at the time one
commit that was good and one that was bad so I added that
information to the
[root@localhost ~]# curl -L https://raw.githubusercontent.com/ostreedev/ostree-releng-scripts/master/rpm-ostree-bisect > /usr/local/bin/rpm-ostree-bisect [root@localhost ~]# chmod +x /usr/local/bin/rpm-ostree-bisect [root@localhost ~]# rpm-ostree-bisect --good 94a9d06eef34aa6774c056356d3d2e024e57a0013b6f8048dbae392a84a137ca --bad 8df48fa2e70ad1952153ae00edbba08ed18b53c3d4095a22985d1085f5203ac6 --testscript /usr/local/bin/test.sh Using data file at: /var/lib/rpm-ostree-bisect.json Created symlink /etc/systemd/system/multi-user.target.wants/rpm-ostree-bisect.service → /etc/systemd/system/rpm-ostree-bisect.service. [root@localhost ~]# reboot
I let it run and when it finished the log from the last run gave me all of the information I needed:
[root@localhost ~]# journalctl -b0 -o cat -u rpm-ostree-bisect.service Starting RPM-OSTree Bisect Testing... Using data file at: /var/lib/rpm-ostree-bisect.json Did not find device timeout. Test Passes Removed /etc/systemd/system/multi-user.target.wants/rpm-ostree-bisect.service. BISECT TEST RESULTS: Last known good commit: 5736e83 : 28.20180708.0 : 2018-07-08T20:03:31Z First known bad commit: bc3aa17 : 28.20180711.0 : 2018-07-11T18:26:22Z libostree pull from 'fedora-atomic' for fedora/28/x86_64/atomic-host complete security: GPG: commit http: TLS non-delta: meta: 270 content: 0 transfer: secs: 43 size: 2.7 MB ostree diff commit old: 5736e832b1fd59208465458265136fbe2aa4ba89517d8bdcc91bc84724f40a8e ostree diff commit new: bc3aa17a5ad6c04103563bd93c4c668996ef786ec04f989a3209a9887c8e982c Upgraded: acl 2.2.52-20.fc28 -> 2.2.53-1.fc28 attr 2.4.47-23.fc28 -> 2.4.48-1.fc28 dracut 047-8.git20180305.fc28 -> 048-1.fc28 dracut-config-generic 047-8.git20180305.fc28 -> 048-1.fc28 dracut-network 047-8.git20180305.fc28 -> 048-1.fc28 kernel 4.17.3-200.fc28 -> 4.17.4-200.fc28 kernel-core 4.17.3-200.fc28 -> 4.17.4-200.fc28 kernel-modules 4.17.3-200.fc28 -> 4.17.4-200.fc28 libacl 2.2.52-20.fc28 -> 2.2.53-1.fc28 libattr 2.4.47-23.fc28 -> 2.4.48-1.fc28 openldap 2.4.46-1.fc28 -> 2.4.46-2.fc28 podman 0.6.5-1.git9d97bd6.fc28 -> 0.7.1-1.git802d4f2.fc28 python3-pytoml 0.1.16-1.fc28 -> 0.1.17-1.fc28 Removed: grubby-8.40-11.fc28.x86_64 Added: libkcapi-1.1.1-1.fc28.x86_64 libkcapi-hmaccalc-1.1.1-1.fc28.x86_64 Started RPM-OSTree Bisect Testing.
So now we now the exact versions of software that we need to look at that
introduced the problem. I had already suspected dracut and this confirmed
my suspicions even more. Going from
NOTE: For anyone interested here is the final rpm-ostree-bisect.json
Having this information ultimately led me to discover that the behavior gets introduced specifically in the iscsi module of the dracut-network rpm. I opened an upstream bug to track the issue further with the dracut maintainers.