Monthly Archive for January, 2015

Crisis Averted.. I'm using Atomic Host

This blog has been running on Docker on Fedora 21 Atomic Host since early January. Occasionally I log in and run rpm-ostree upgrade followed by a subsequent reboot (usually after I inspect a few things). Today I happened to do just that and what did I come up with?? A bunch of 404s. Digging through some of the logs for the systemd unit file I use to start my wordpress container I found this:

systemd[1]: wordpress-server.service: main process exited, code=exited, status=1/FAILURE
docker[2321]: time="2015-01-31T19:09:24-05:00" level="fatal" msg="Error response from daemon: Cannot start container 51a2b8c45bbee564a61bcbffaee5bc78357de97cdd38918418026c26ae40fb09: write /sys/fs/cgroup/memory/system.slice/docker-51a2b8c45bbee564a61bcbffaee5bc78357de97cdd38918418026c26ae40fb09.scope/memory.memsw.limit_in_bytes: invalid argument"

Hmmm.. So that means I have updated to the latest atomic and docker doesn't work?? What am I to do?

Well, the nice thing about atomic host is that in moments like these you can easily go back to the state you were before you upgraded. A quick rpm-ostree rollback and my blog was back up and running in minutes.

Whew! Crisis averted.. But now what? Well the nice thing about atomic host is that I can easily go to another (non-production) system and test out exactly the same scenario as the upgrade that I performed in production. Some quick googling led me to this github issue which looks like it has to do with setting memory limits when you start a container using later versions of systemd.

Let's test out that theory by recreating this failure.

Recreating the Failure

To recreate I decided to start with the Fedora 21 atomic cloud image that was released in December. Here is what I have:

-bash-4.3# ostree admin status
* fedora-atomic ba7ee9475c462c9265517ab1e5fb548524c01a71709539bbe744e5fdccf6288b.0
    origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
* 2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm -q docker-io systemd
docker-io-1.3.2-2.fc21.x86_64
systemd-216-12.fc21.x86_64
-bash-4.3#
-bash-4.3# docker run --rm --memory 500M busybox echo "I'm Alive"
Unable to find image 'busybox' locally
Pulling repository busybox
4986bf8c1536: Download complete
511136ea3c5a: Download complete
df7546f9f060: Download complete
ea13149945cb: Download complete
Status: Downloaded newer image for busybox:latest
I'm Alive

So the system is up and running and able to run a container with the --memory option set. Now lets upgrade to the same commit that I did when I saw the failure earlier and reboot:

-bash-4.3# ostree pull fedora-atomic 153f577dc4b039e53abebd7c13de6dfafe0fb64b4fdc2f5382bdf59214ba7acb

778 metadata, 4374 content objects fetched; 174535 KiB transferred in 156 seconds
-bash-4.3#
-bash-4.3# echo 153f577dc4b039e53abebd7c13de6dfafe0fb64b4fdc2f5382bdf59214ba7acb > /ostree/repo/refs/remotes/fedora-atomic/fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# ostree admin deploy fedora-atomic:fedora-atomic/f21/x86_64/docker-host
Copying /etc changes: 26 modified, 4 removed, 36 added
Transaction complete; bootconfig swap: yes deployment count change: 1
-bash-4.3#
-bash-4.3# ostree admin status
  fedora-atomic 153f577dc4b039e53abebd7c13de6dfafe0fb64b4fdc2f5382bdf59214ba7acb.0
    origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host
* fedora-atomic ba7ee9475c462c9265517ab1e5fb548524c01a71709539bbe744e5fdccf6288b.0
    origin refspec: fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
  2015-01-31 21:08:35     153f577dc4     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
* 2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3# reboot

Note that I had to manually update the ref to point to the commit I downloaded in order to get this to work. I'm not sure why this is but it wouldn't work otherwise.

Ok now I had a system using the same tree that I was when I saw the failure. Let's check to see if it still happens:

-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
* 2015-01-31 21:08:35     153f577dc4     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
  2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3#
-bash-4.3# rpm -q docker-io systemd
docker-io-1.4.1-5.fc21.x86_64
systemd-216-17.fc21.x86_64
-bash-4.3#
-bash-4.3# docker run --rm --memory 500M busybox echo "I'm Alive"
FATA[0003] Error response from daemon: Cannot start container d79629bfddc7833497b612e2b6d4cc2542ce9a8c2253d39ace4434bbd385185b: write /sys/fs/cgroup/memory/system.slice/docker-d79629bfddc7833497b612e2b6d4cc2542ce9a8c2253d39ace4434bbd385185b.scope/memory.memsw.limit_in_bytes: invalid argument

Yep! Looks like it consistently happens. This is good because this is a recreator that can now be used by anyone to verify the problem on their own. For completeness I'll go ahead and rollback the system to show that the problem goes away when back in the old state:

-bash-4.3# rpm-ostree rollback
Moving 'ba7ee9475c462c9265517ab1e5fb548524c01a71709539bbe744e5fdccf6288b.0' to be first deployment
Transaction complete; bootconfig swap: yes deployment count change: 0
Changed:
  NetworkManager-1:0.9.10.0-13.git20140704.fc21.x86_64
  NetworkManager-glib-1:0.9.10.0-13.git20140704.fc21.x86_64
  ...
  ...
Removed:
  flannel-0.2.0-1.fc21.x86_64
Sucessfully reset deployment order; run "systemctl reboot" to start a reboot
-bash-4.3# reboot

And the final test:

-bash-4.3# rpm-ostree status
  TIMESTAMP (UTC)         ID             OSNAME            REFSPEC
* 2014-12-03 01:30:09     ba7ee9475c     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
  2015-01-31 21:08:35     153f577dc4     fedora-atomic     fedora-atomic:fedora-atomic/f21/x86_64/docker-host
-bash-4.3# docker run --rm --memory 500M busybox echo "I'm Alive"
I'm Alive

Bliss! And you can thank Atomic Host for that.

Dusty

quick audit rules for sanity check

Most of the time when I really want to figure out what is going on deep within a piece of software I break out strace and capture all the gory detail. Unfortunately it isn't always that easy to manipulate and run something from the command line but I have found that some simple uses of the audit daemon can give you great insight without having to dig too deep.

Example Problem

I have a script, switch.py, I want to call via a bound key sequence from i3 window manager. However, I notice that nothing happens when I press the key sequence. Is the script failing or is the script not getting called at all? auditd and auditctl can help us figure this out.

Using audit

To take advantage of system auditing the daemon must be up and running:

# systemctl status auditd.service | grep active
       Active: active (running) since Sun 2015-01-25 13:56:27 EST; 1 day 9h ago

You can then add a watch for read/write/execute/attribute accesses on the file:

# auditctl -w /home/dustymabe/.i3/switch.py -p rwxa -k 'switchtest'
# auditctl -l
-w /home/dustymabe/.i3/switch.py -p rwxa -k switchtest

Notice the usage of the -k option to add a key to the rule. This means any events that match the rule will be tagged with this key and can be easily found. Any accesses will be logged and can be viewed later by using ausearch and aureport. After putting the rules in place in another terminal I accessed the file as a normal user:

$ pwd
/home/dustymabe
$ cat  .i3/switch.py
... contents of file ...
$ ls .i3/switch.py
.i3/switch.py

Then I was able to use a combination of ausearch and aureport to easily see who accessed the file and how it was accessed:

# ausearch -k switchtest --raw | aureport --file

File Report
===============================================
# date time file syscall success exe auid event
===============================================
1. 01/26/15 22:59:26 .i3/switch.py 2 yes /usr/bin/cat 1000 1299
2. 01/26/15 23:00:19 .i3/switch.py 191 no /usr/bin/ls 1000 1300

Awesome.. So with auditing working now all I have to do is press the key sequence to see if my script is getting called?? Turns out it was being called:

# ausearch -k switchtest --raw | aureport --file

File Report
===============================================
# date time file syscall success exe auid event
===============================================
1. 01/26/15 22:59:26 .i3/switch.py 2 yes /usr/bin/cat 1000 1299
2. 01/26/15 23:00:19 .i3/switch.py 191 no /usr/bin/ls 1000 1300
10. 01/26/15 23:38:15 /home/dustymabe/.i3/switch.py 59 yes /usr/bin/python2.7 1000 1326
11. 01/26/15 23:38:15 /home/dustymabe/.i3/switch.py 89 no /usr/bin/python2.7 1000 1327
12. 01/26/15 23:38:15 /home/dustymabe/.i3/switch.py 2 yes /usr/bin/python2.7 1000 1328
13. 01/26/15 23:38:15 /home/dustymabe/.i3/switch.py 2 yes /usr/bin/python2.7 1000 1329
14. 01/26/15 23:38:15 /home/dustymabe/.i3/switch.py 2 yes /usr/bin/python2.7 1000 1330
15. 01/26/15 23:38:15 /home/dustymabe/.i3/switch.py 2 yes /usr/bin/python2.7 1000 1331

So that enabled me to concentrate on my script and find the bug that was lurking within :)

Have fun auditing!
Dusty

Fedora 21 now available on Digital Ocean

cross posted from this fedora magazine post

It's raining Droplets! Fedora 21 has landed in Digital Ocean's cloud hosting. Fedora 21 offers a fantastic cloud image for developers, and it's now easy for Digital Ocean users to spin it up and get started! Here are a couple of tips:

  • Like with other Digital Ocean images, you will log in with your ssh key as root rather than the typical fedora user that you may be familiar with when logging in to a Fedora cloud image.
  • This is the first time Digital Ocean has SELinux enabled by default (yay for security). If you want or need to you can still easily switch back to permissive mode; Red Hat's Dan Walsh may have a "shame on you" or two for you though.
  • Fedora 21 should be available in all the newest datacenters in each region, but some legacy datacenters aren't supported. If you have a problem you think is Fedora specific then drop us an email at , ping us in #fedora-cloud on freenode, or visit the Fedora cloud trac to see if it is already being worked on.

Happy Developing!
Dusty

PS If anyone wants a $10 credit for creating a new account you can use my referral link

qemu-img Backing Files: A Poor Man's Snapshot/Rollback

I often like to formulate detailed steps when trying to reproduce a bug or a working setup. VMs are great for this because they can be manipulated easily. To manipulate their disk images I use qemu-img to create new disk images that use other disk images as a backing store. This is what I like to call a "poor man's" way to do snapshots because the snapshotting process is a bit manual, but that is also why I like it; I don't touch the original disk image at all so I have full confidence I haven't compromised it.

NOTE: I use QEMU/KVM/Libvirt so those are the tools used in this example:

Taking A Snapshot

In order to take a snapshot you should first shutdown the VM and then simply create a new disk image that uses the original disk image as a backing store:

$ sudo virsh shutdown F21server
Domain F21server is being shutdown
$ sudo qemu-img create -f qcow2 -b /guests/F21server.img /guests/F21server.qcow2.snap
Formatting '/guests/F21server.qcow2.snap', fmt=qcow2 size=21474836480 backing_file='/guests/F21server.img' encryption=off cluster_size=65536 lazy_refcounts=off

This new disk image is a COW snapshot of the original image, which means any writes will go into the new image but any reads of non-modified blocks will be read from the original image. A benefit of this is that the size of the new file will start off at 0 and increase only as modifications are made.

To get the virtual machine to pick up and start using the new COW disk image we will need to modify the libvirt XML to point it at the new file:

$ sudo virt-xml F21server --edit target=vda --disk driver_type=qcow2,path=/guests/F21server.qcow2.snap --print-diff
--- Original XML
+++ Altered XML
@@ -27,8 +27,8 @@
   <devices>
     <emulator>/usr/bin/qemu-kvm</emulator>
     <disk type="file" device="disk">
-      <driver name="qemu" type="raw"/>
-      <source file="/guests/F21server.img"/>
+      <driver name="qemu" type="qcow2"/>
+      <source file="/guests/F21server.qcow2.snap"/>
       <target dev="vda" bus="virtio"/>
       <address type="pci" domain="0x0000" bus="0x00" slot="0x07" function="0x0"/>
     </disk>
$
$ sudo virt-xml F21server --edit target=vda --disk driver_type=qcow2,path=/guests/F21server.qcow2.snap
Domain 'F21server' defined successfully.

You can now start your VM and make changes as you wish. Be destructive if you like; the original disk image hasn't been touched.

After making a few changes I had around 15M of differences between the original image and the snapshot:

$ du -sh /guests/F21server.img
21G     /guests/F21server.img
$ du -sh /guests/F21server.qcow2.snap
15M     /guests/F21server.qcow2.snap

Going Back

To go back to the point you started you must first delete the file that you created (/guests/F21server.qcow2.snap) and then you have two options:

  • Again create a disk image using the origin as a backing file.
  • Go back to using the original image.

If you want to continue testing and going back to your starting point then you will want to delete and recreate the COW snapshot disk image:

$ sudo rm /guests/F21server.qcow2.snap
$ sudo qemu-img create -f qcow2 -b /guests/F21server.img /guests/F21server.qcow2.snap
Formatting '/guests/F21server.qcow2.snap', fmt=qcow2 size=21474836480 backing_file='/guests/F21server.img' encryption=off cluster_size=65536 lazy_refcounts=off

If you want to go back to your original setup then we'll also need to change back the xml to what it was before:

$ sudo rm /guests/F21server.qcow2.snap
$ sudo virt-xml F21server --edit target=vda --disk driver_type=raw,path=/guests/F21server.img
Domain 'F21server' defined successfully.

Committing Changes

If you happen to decide that the changes you have made are some that you want to carry forward then you can commit the changes in the COW disk image into the backing disk image. In the case below I have 15M worth of changes that get committed back into the original image. I then edit the xml accordingly and can start the guest with all the changes baked back into the original disk image:

$ sudo qemu-img info /guests/F21server.qcow2.snap
image: /guests/F21server.qcow2.snap
file format: qcow2
virtual size: 20G (21474836480 bytes)
disk size: 15M
cluster_size: 65536
backing file: /guests/F21server.img
$ sudo qemu-img commit /guests/F21server.qcow2.snap
Image committed.
$ sudo rm /guests/F21server.qcow2.snap
$ sudo virt-xml F21server --edit target=vda --disk driver_type=raw,path=/guests/F21server.img
Domain 'F21server' defined successfully.

Fin

This backing file approach is useful because it's much more convenient than making multiple copies of huge disk image files, but it can be used for much more than just snapshotting/reverting changes. It can also be used to start 100 virtual machines from a common backing image, thus saving space...etc.. Go ahead and try it!

Happy Snapshotting!
Dusty