Crisis Averted.. I'm using Atomic Host -

This blog has been running on Docker on Fedora 21 Atomic Host since early January. Occasionally I log in and run rpm-ostree upgrade followed by a subsequent reboot (usually after I inspect a few things). Today I happened to do just that and what did I come up with?? A bunch of 404s. Digging through some of the logs for the systemd unit file I use to start my wordpress container I found this:

systemd[1]: wordpress-server.service: main process exited, code=exited, status=1/FAILURE
docker[2321]: time="2015-01-31T19:09:24-05:00" level="fatal" msg="Error response from daemon: Cannot start container 51a2b8c45bbee564a61bcbffaee5bc78357de97cdd38918418026c26ae40fb09: write /sys/fs/cgroup/memory/system.slice/docker-51a2b8c45bbee564a61bcbffaee5bc78357de97cdd38918418026c26ae40fb09.scope/memory.memsw.limit_in_bytes: invalid argument"

Hmmm.. So that means I have updated to the latest atomic and docker doesn't work?? What am I to do?

Well, the nice thing about atomic host is that in moments like these you can easily go back to the state you were before you upgraded. A quick rpm-ostree rollback and my blog was back up and running in minutes.

Whew! Crisis averted.. But now what? Well the nice thing about atomic host is that I can easily go to another (non-production) system and test out exactly the same scenario as the upgrade that I performed in production. Some quick googling led me to this github issue which looks like it has to do with setting memory limits when you start a container using later versions of systemd.

Let's test out that theory by recreating this failure.

Recreating the Failure

To recreate I decided to start with the Fedora 21 atomic cloud image that was released in December. Here is what I have:

-bash-4.3# ostree admin status
* fedora-atomic ba7ee9475c462c9265517ab1e5fb548524c01a71709539bbe744e5fdccf6288b.0
    origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
* 2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm -q docker-io systemd
docker-io-1.3.2-2.fc21.x86_64
systemd-216-12.fc21.x86_64
-bash-4.3#
-bash-4.3# docker run --rm --memory 500M busybox echo "I'm Alive"
Unable to find image 'busybox' locally
Pulling repository busybox
4986bf8c1536: Download complete
511136ea3c5a: Download complete
df7546f9f060: Download complete
ea13149945cb: Download complete
Status: Downloaded newer image for busybox:latest
I'm Alive

So the system is up and running and able to run a container with the --memory option set. Now lets upgrade to the same commit that I did when I saw the failure earlier and reboot:

-bash-4.3# ostree pull fedora-atomic 153f577dc4b039e53abebd7c13de6dfafe0fb64b4fdc2f5382bdf59214ba7acb

778 metadata, 4374 content objects fetched; 174535 KiB transferred in 156 seconds
-bash-4.3#
-bash-4.3# echo 153f577dc4b039e53abebd7c13de6dfafe0fb64b4fdc2f5382bdf59214ba7acb > /ostree/repo/refs/remotes/fedora-atomic/fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# ostree admin deploy fedora-atomic:fedora-atomic/f21/x86_64/docker-host
Copying /etc changes: 26 modified, 4 removed, 36 added
Transaction complete; bootconfig swap: yes deployment count change: 1
-bash-4.3#
-bash-4.3# ostree admin status
  fedora-atomic 153f577dc4b039e53abebd7c13de6dfafe0fb64b4fdc2f5382bdf59214ba7acb.0
    origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host
* fedora-atomic ba7ee9475c462c9265517ab1e5fb548524c01a71709539bbe744e5fdccf6288b.0
    origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
  2015-01-31 21:08:35     153f577dc4     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
* 2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3# reboot

Note that I had to manually update the ref to point to the commit I downloaded in order to get this to work. I'm not sure why this is but it wouldn't work otherwise.

Ok now I had a system using the same tree that I was when I saw the failure. Let's check to see if it still happens:

-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
* 2015-01-31 21:08:35     153f577dc4     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
  2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm -q docker-io systemd
docker-io-1.4.1-5.fc21.x86_64
systemd-216-17.fc21.x86_64
-bash-4.3#
-bash-4.3# docker run --rm --memory 500M busybox echo "I'm Alive"
FATA[0003] Error response from daemon: Cannot start container d79629bfddc7833497b612e2b6d4cc2542ce9a8c2253d39ace4434bbd385185b: write /sys/fs/cgroup/memory/system.slice/docker-d79629bfddc7833497b612e2b6d4cc2542ce9a8c2253d39ace4434bbd385185b.scope/memory.memsw.limit_in_bytes: invalid argument

Yep! Looks like it consistently happens. This is good because this is a recreator that can now be used by anyone to verify the problem on their own. For completeness I'll go ahead and rollback the system to show that the problem goes away when back in the old state:

-bash-4.3# rpm-ostree rollback
Moving 'ba7ee9475c462c9265517ab1e5fb548524c01a71709539bbe744e5fdccf6288b.0' to be first deployment
Transaction complete; bootconfig swap: yes deployment count change: 0
Changed:
  NetworkManager-1:0.9.10.0-13.git20140704.fc21.x86_64
  NetworkManager-glib-1:0.9.10.0-13.git20140704.fc21.x86_64
  ...
  ...
Removed:
  flannel-0.2.0-1.fc21.x86_64
Sucessfully reset deployment order; run "systemctl reboot" to start a reboot
-bash-4.3# reboot

And the final test:

-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
* 2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
  2015-01-31 21:08:35     153f577dc4     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3# docker run --rm --memory 500M busybox echo "I'm Alive"
I'm Alive

Bliss! And you can thank Atomic Host for that.

Dusty