Docker: Copy Into A Container Volume


I need to copy a few files into my docker container.. Should be easy right? Turns out it's not so trivial. In Docker 1.0.0 and earlier the docker cp command can be used to copy files from a container to the host, but not the other way around...

Most of the time you can work around this by using an ADD statement in the Dockerfile but I often need to populate some data within data-only volume containers before I start other containers that use the data. To achieve copying data into the volume you can simply use tar and pipe the contents into the volume within a new container like so:
[root@localhost ~]# docker run -d -i -t -v /volumes/wpdata --name wpdata mybusybox sh 416ea2a877267f566ef8b054a836e8b6b2550b347143c4fe8ed2616e11140226 [root@localhost ~]# [root@localhost ~]# tar -c files/ | docker run -i --rm -w /volumes/wpdata/ --volumes-from wpdata mybusybox tar -xv files/ files/file8.txt files/file9.txt files/file4.txt files/file7.txt files/file1.txt files/file6.txt files/file2.txt files/file5.txt files/file10.txt files/file3.txt

So.. In the example I created a new data-only volume container named wpdata and then ran tar to pipe the contents of a directory to a new container that also used the same volumes as the original container. Not so tough, but not as easy as docker cp. I think docker cp should have this functionality sometime in the future ( issue tracker here ).

Enjoy

Dusty

Creating Your Own Minimal Docker Image in Fedora


Sometimes it can be useful to have a docker image with just the bare essentials. Maybe you want to have a container with just enough to run your app or you are using something like data volume containers and want just enough to browse the filesystem. Either way you can create your own minimalist busybox image on Fedora with a pretty simple script.

The script below was inspired a little from Marek Goldmann's post about creating a minimal image for wildfly and a little from the busybox website .

# cd to a temporary directory tmpdir=$(mktemp -d) pushd $tmpdir # Get and extract busybox yumdownloader busybox rpm2cpio busybox*rpm | cpio -imd rm -f busybox*rpm # Create symbolic links back to busybox for i in $(./sbin/busybox --list);do ln -s /sbin/busybox ./sbin/$i done # Create container tar -c . | docker import - mybusybox # Go back to old pwd popd

After running the script there is a new image on your system with the mybusybox tag. You can run it and take a look around like so:
[root@localhost ~]# docker images mybusybox REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE mybusybox latest f526db9e0d80 12 minutes ago 1.309 MB [root@localhost ~]# [root@localhost ~]# docker run -i -t mybusybox /sbin/busybox sh # ls -l /sbin/ls lrwxrwxrwx 1 0 0 13 Jul 8 02:15 /sbin/ls -> /sbin/busybox # # ls / dev etc proc sbin sys usr # # df -kh . Filesystem Size Used Available Use% Mounted on /dev/mapper/docker-253:0-394094-addac9507205082fbd49c8f45bbd0316fd6b3efbb373bb1d717a3ccf44b8a97e 9.7G 23.8M 9.2G 0% /

Enjoy!

Dusty

Manual Linux Installs with Funky Storage Configurations

Introduction


I often find that my tastes for hard drive configurations on my installed systems is a bit outside of the norm. I like playing with thin LVs, BTRFS snapshots, or whatever new thing there is around the corner. The Anaconda UI has been adding support for these fringe cases but I still find it hard to get Anaconda to do what I want in certain cases.

An example of this happened most recently when I went to reformat and install Fedora 20 on my laptop. Ultimately what I wanted was encrypted root and swap devices and btrfs filesystems on root and boot. One other requirement was that I needed to leave sda4 (a Windows Partition) completely intact. At the end the configuration should look something like:

[root@lintop ~]# lsblk /dev/sda NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 465.8G 0 disk ├─sda1 8:1 0 1G 0 part /boot ├─sda2 8:2 0 4G 0 part │ └─cryptoswap 253:1 0 4G 0 crypt [SWAP] ├─sda3 8:3 0 299.2G 0 part │ └─cryptoroot 253:0 0 299.2G 0 crypt / └─sda4 8:4 0 161.6G 0 part

After a few failed attempts with Anaconda I decided to do a custom install instead.

Custom Install


I used the Fedora 20 install DVD (and thus the Anaconda environment) to do the install but I performed all the steps manually by switching to a different terminal with a prompt. First off I used fdisk to format the disk the way I wanted. The results looked like:
[anaconda root@localhost ~]# fdisk -l /dev/sda Disk /dev/sda: 465.8 GiB, 500107862016 bytes, 976773168 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 4096 bytes I/O size (minimum/optimal): 4096 bytes / 4096 bytes Disklabel type: dos Disk identifier: 0xcfe1cf72 Device Boot Start End Blocks Id System /dev/sda1 * 2048 2099199 1048576 83 Linux /dev/sda2 2099200 10487807 4194304 82 Linux swap / Solaris /dev/sda3 10487808 637945855 313729024 83 Linux /dev/sda4 637945856 976773119 169413632 7 HPFS/NTFS/exFAT

Next I set up the encrypted root (/dev/sda3) device and created a btrfs filesystems on both boot (/dev/sda2) and the encrypted root device (/dev/mapper/cryptoroot):

[anaconda root@localhost ~]# cryptsetup luksFormat /dev/sda3 ... [anaconda root@localhost ~]# cryptsetup luksOpen /dev/sda3 cryptoroot Enter passphrase for /dev/sda3: [anaconda root@localhost ~]# [anaconda root@localhost ~]# mkfs.btrfs --force --label=root /dev/mapper/cryptoroot ... fs created label root on /dev/mapper/cryptoroot ... [anaconda root@localhost ~]# mkfs.btrfs --force --label=boot --mixed /dev/sda1 ... fs created label boot on /dev/sda1 ...

Next, if you want to use the yum cli then you need to install it because some of the files are left out of the environment by default. I show the error you get below and then how to fix it:

[anaconda root@localhost ~]# yum list Traceback (most recent call last): File "/bin/yum", line 28, in import yummain ImportError: No module named yummain [anaconda root@localhost ~]# rpm -ivh --nodeps /run/install/repo/Packages/y/yum-3.4.3-106.fc20.noarch.rpm ...

I needed to set up a repo that used the DVD as the source:

[anaconda root@localhost ~]# cat <<EOF > /etc/yum.repos.d/repo.repo [dvd] name=dvd baseurl=file:///run/install/repo enabled=1 gpgcheck=0 EOF

Now I could mount my root device on /mnt/sysimage and then lay down the basic filesystem tree by installing the filesystem package into it:

[anaconda root@localhost ~]# mount /dev/mapper/cryptoroot /mnt/sysimage/ [anaconda root@localhost ~]# yum install -y --installroot=/mnt/sysimage filesystem ... Complete!

Now I can mount boot and other filesystems into the /mnt/sysimage tree:

[anaconda root@localhost ~]# mount /dev/sda1 /mnt/sysimage/boot/ [anaconda root@localhost ~]# mount -v -o bind /dev /mnt/sysimage/dev/ mount: /dev bound on /mnt/sysimage/dev. [anaconda root@localhost ~]# mount -v -o bind /run /mnt/sysimage/run/ mount: /run bound on /mnt/sysimage/run. [anaconda root@localhost ~]# mount -v -t proc proc /mnt/sysimage/proc/ mount: proc mounted on /mnt/sysimage/proc. [anaconda root@localhost ~]# mount -v -t sysfs sys /mnt/sysimage/sys/ mount: sys mounted on /mnt/sysimage/sys.

Now ready for the actual install. For this install I just went with a small set of packages (I'll use yum once the system is up to add on what I want later).

[anaconda root@localhost ~]# yum install -y --installroot=/mnt/sysimage @core @standard kernel grub2 grub2-tools btrfs-progs ... Complete!

After the install there are a few housekeeping items to take care of. I started with populating crypttab, populating fstab, changing the root password, and touching /.autorelabel to trigger an selinux relabel on first boot:

[anaconda root@localhost ~]# chroot /mnt/sysimage/ [anaconda root@localhost /]# cat <<EOF > /etc/crypttab cryptoswap /dev/sda2 /dev/urandom swap cryptoroot /dev/sda3 - EOF [anaconda root@localhost /]# cat <<EOF > /etc/fstab LABEL=boot /boot btrfs defaults 1 2 /dev/mapper/cryptoswap swap swap defaults 0 0 /dev/mapper/cryptoroot / btrfs defaults 1 1 EOF [anaconda root@localhost /]# passwd --stdin root <<< "password" Changing password for user root. passwd: all authentication tokens updated successfully. [anaconda root@localhost /]# touch /.autorelabel

Next I needed to install grub and make a config file. I set the grub kernel command line arguments and then generated a config. The config needed some fixing up (I am not using EFI on my system, but grub2-mkconfig thought I was because I had booted through EFI off of the install CD).

[anaconda root@localhost /]# echo 'GRUB_CMDLINE_LINUX="ro root=/dev/mapper/cryptoroot"' > /etc/default/grub [anaconda root@localhost /]# grub2-mkconfig -o /boot/grub2/grub.cfg Generating grub.cfg ... Found linux image: /boot/vmlinuz-3.11.10-301.fc20.x86_64 Found initrd image: /boot/initramfs-3.11.10-301.fc20.x86_64.img Found linux image: /boot/vmlinuz-0-rescue-81c04e9030594ef6a5265a95f58ccf98 Found initrd image: /boot/initramfs-0-rescue-81c04e9030594ef6a5265a95f58ccf98.img done [anaconda root@localhost /]# sed -i s/linuxefi/linux/ /boot/grub2/grub.cfg [anaconda root@localhost /]# sed -i s/initrdefi/initrd/ /boot/grub2/grub.cfg [anaconda root@localhost /]# grub2-install -d /usr/lib/grub/i386-pc/ /dev/sda Installation finished. No error reported.

NOTE: grub2-mkconfig didn't find my windows partition until I rebooted into the system and ran it again.

Finally I re-executed dracut to pick up the crypttab, exited the chroot, unmounted the filesystems, and rebooted into my new system:
[anaconda root@localhost /]# dracut --kver 3.11.10-301.fc20.x86_64 --force [anaconda root@localhost /]# exit [anaconda root@localhost ~]# umount /mnt/sysimage/{boot,dev,run,sys,proc} [anaconda root@localhost ~]# reboot

After booting into Fedora I was then able to run grub2-mkconfig again and get it to recognize my (untouched) Windows partition:

[root@localhost /]# grub2-mkconfig -o /boot/grub2/grub.cfg Generating grub.cfg ... Found linux image: /boot/vmlinuz-3.11.10-301.fc20.x86_64 Found initrd image: /boot/initramfs-3.11.10-301.fc20.x86_64.img Found linux image: /boot/vmlinuz-0-rescue-375c7019484a45838666c572d241249a Found initrd image: /boot/initramfs-0-rescue-375c7019484a45838666c572d241249a.img Found Windows 7 (loader) on /dev/sda4 done

And that's pretty much it. Using this method you can have virtually any hard drive setup that you desire. Hope someone else can find this useful.

Dusty

P.S. You can start sshd in anaconda by running systemctl start anaconda-sshd.service.

TermRecord: Terminal Screencast in a Self-Contained HTML File

Introduction


Some time ago I wrote a few posts ( 1, 2 ) on how to use script to record a terminal session and then scriptreplay to play it back. This functionality can be very useful by enabling you the power to show others what happens when you do insert anything here.

I have been happy with this solution for a while until one day Wolfgang Richter commented on my original post and shared a project he has been working on known as TermRecord.

I gave it a spin and have been using it quite a bit. Sharing a terminal recording now becomes much easier as you can simply email the .html file or you can host it yourself and share links. As long the people you are sharing with have a browser then they can watch the playback. Thus, it is not tied to a system with a particular piece of software and clicking a link to view is very easy to do :)

Basics of TermRecord


Before anything else we need to install TermRecord. Currently TermRecord is available in the python package index (hopefully will be packaged in some major distributions soon) and can be installed using pip.
[root@localhost ~]# pip install TermRecord Downloading/unpacking TermRecord Downloading TermRecord-1.1.3.tar.gz (49kB): 49kB downloaded Running setup.py egg_info for package TermRecord ... ... Successfully installed TermRecord Jinja2 markupsafe Cleaning up...
Now you can make a self-contained html file for sharing in a couple of ways.

First, you can use TermRecord to convert already existing timing and log files that were created using the script command by specifying them as inputs to TermRecord:
[root@localhost ~]# TermRecord -o screencast.html -t screencast.timing -s screencast.log

The other option is to create a new recording using TermRecord like so:
[root@localhost ~]# TermRecord -o screencast.html Script started, file is /tmp/tmp5I4SYq [root@localhost ~]# [root@localhost ~]# #This is a screencast. [root@localhost ~]# exit exit Script done, file is /tmp/tmp5I4SYq

And.. Done. Now you can email or share the html file any way you like. If you would like to see some examples of terminal recordings you can check out the TermRecord github page or here is one from my previous post on wordpress/docker.

Cheers,
Dusty

Zero to WordPress on Docker in 5 Minutes

Introduction


Docker is an emerging technology that has garnered a lot of momentum in the past year. I have been busy with a move to NYC and a job change (now officially a Red Hatter), so I am just now getting around to getting my feet wet with Docker.

Last night I sat down and decided to bang out some steps for installing wordpress in a docker container. Eventually I plan to move this site into a container so I figured this would be a good first step.

DockerPress


There a few bits and pieces that need to be done to configure wordpress. For simplicity I decided to make this wordpress instance use sqlite rather than mysql. Considering all of this here is the basic recipe for wordpress:
  • Install apache and php.
  • Download wordpress and extract to appropriate folder.
  • Download the sqlite-integration plugin and extract.
  • Modify a few files...and DONE.
This is easily automated by creating a Dockerfile and using docker. The minimal Dockerfile (with comments) is shown below:
FROM goldmann/f20 MAINTAINER Dusty Mabe # Install httpd and update openssl RUN yum install -y httpd openssl unzip php php-pdo # Download and extract wordpress RUN curl -o wordpress.tar.gz http://wordpress.org/latest.tar.gz RUN tar -xzvf wordpress.tar.gz --strip-components=1 --directory /var/www/html/ RUN rm wordpress.tar.gz # Download plugin to allow WP to use sqlite # http://wordpress.org/plugins/sqlite-integration/installation/ # - Move sqlite-integration folder to wordpress/wp-content/plugins folder. # - Copy db.php file in sqlite-integratin folder to wordpress/wp-content folder. # - Rename wordpress/wp-config-sample.php to wordpress/wp-config.php. # RUN curl -o sqlite-plugin.zip http://downloads.wordpress.org/plugin/sqlite-integration.1.6.3.zip RUN unzip sqlite-plugin.zip -d /var/www/html/wp-content/plugins/ RUN rm sqlite-plugin.zip RUN cp /var/www/html/wp-content/{plugins/sqlite-integration/db.php,} RUN cp /var/www/html/{wp-config-sample.php,wp-config.php} # # Fix permissions on all of the files RUN chown -R apache /var/www/html/ RUN chgrp -R apache /var/www/html/ # # Update keys/salts in wp-config for security RUN RE='put your unique phrase here'; for i in {1..8}; do KEY=$(openssl rand -base64 40); sed -i "0,/$RE/s|$RE|$KEY|" /var/www/html/wp-config.php; done; # # Expose port 80 and set httpd as our entrypoint EXPOSE 80 ENTRYPOINT ["/usr/sbin/httpd"] CMD ["-D", "FOREGROUND"]

With the power of the Dockerfile you can now build a new image using docker build and then run the new container with the docker run command. An example of these two commands is shown below:
[root@localhost ~]# ls Dockerfile Dockerfile [root@localhost ~]# docker build -t "wordpress" . ... Successfully built 0b388013905e ... [root@localhost ~]# [root@localhost ~]# docker run -d -p 8080:80 -t wordpress 6da59c864d35bb0bb6043c09eb8b1128b2c1cb91f7fa456156df4a0a22f271b0

The docker build command will build an image from the Dockerfile and then tag the new image with the "wordpress" tag. The docker run command will run a new container based on the "wordpress" image and bind port 8080 from the host machine to port 80 within the container.

Now you can happily point your browser to http://localhost:8080 and see the wordpress 5 minute installation screen:



See a full screencast of the "zero to wordpress" process using docker here .
Download the Dockerfile here .

Cheers!
Dusty

NOTE: This was done on Fedora 20 with docker-io-0.9.1-1.fc20.x86_64.

Fedup 19 to 20 with a Thin LVM Configuration

Introduction


I have been running my home desktop on thin logical volumes for a while now. I have enjoyed the flexibility of this setup and I like taking a snapshot before making any big changes to my setup. Recently I decided to update to Fedora 20 from Fedora 19 and I hit some trouble along the way because the Fedora 20 initramfs (images/pxeboot/upgrade.img) that is used by fedup for the upgrade does not have support for thin logical volumes. After running fedup and rebooting you end up with a message to the screen that looks something like this:
[ OK ] Started Show Plymouth Boot Screen. [ OK ] Reached target Paths. [ OK ] Reached target Basic System. [ 191.023332] dracut-initqueue[363]: Warning: Could not boot. [ 191.028263] dracut-initqueue[363]: Warning: /dev/mapper/vg_root-thin_root does not exist [ 191.029689] dracut-initqueue[363]: Warning: /dev/vg_root/thin_root does not exist Starting Dracut Emergency Shell... Warning: /dev/mapper/vg_root-thin_root does not exist Warning: /dev/vg_root/thin_root does not exist Generating "/run/initramfs/rdsosreport.txt" Entering emergency mode. Exit the shell to continue.

Working Around the Issue


First off run install and run fedup :
[root@localhost ~]# yum update -y fedup fedora-release &>/dev/null [root@localhost ~]# fedup --network 20 &>/dev/null

After running fedup usually you would be able to reboot and go directly into the upgrade process. For us we need to add a few helper utilities (thin_dump, thin_check, thin_restore) to the initramfs so that thin LVs will work. This can be done by appending more files in a cpio archive to the end of the initramfs that was downloaded by fedup. I learned about this technique by peeking at the initramfs_append_files() function within fedup's boot.py. Note also that I had to append a few libraries that are required by the utilities into the initramfs as well.

[root@localhost ~]# cpio -co >> /boot/initramfs-fedup.img << EOF /lib64/libexpat.so.1 /lib64/libexpat.so.1.6.0 /lib64/libstdc++.so.6 /lib64/libstdc++.so.6.0.18 /usr/sbin/thin_dump /usr/sbin/thin_check /usr/sbin/thin_restore EOF 4334 blocks [root@localhost ~]#

And thats it.. You are now able to reboot into the upgrade environment and watch the upgrade. If you'd like to watch a (rather lengthy) screencast of the entire process then you can download the screencast.log and the screencast.timing files and follow the instructions here.

Dusty

Nested Virt and Fedora 20 Virt Test Day

Introduction


I decided this year to take part in the Fedora Virtualization Test Day on October 8th. In order to take part I needed a system with Fedora 20 installed so that I could then create VMs on top. Since I like my current setup and I didn't have a hard drive laying around that I wanted to wipe I decided to give nested virtualization a shot.

Most of the documentation I have seen for nested virtualization has come from Kashyap Chamarthy. Relevant posts are here, here, and here. He has done a great job with these tutorials and this post is nothing more than my notes for what I found to work for me.

Steps


With nested virtualization the OS/Hypervisor that touches the physical hardware is known as L0. The first level of virtualized guest is known as L1. The second level of virtualized guest (the guest inside a guest) is known as L2. In my setup I ultimately wanted F19(L0), F20(L1), and F20(L2).

First, in order to pass along intel vmx extensions to the guest I created a modprobe config file that instructs the kvm_intel kernel module to allow nested virtualization support:

[root@L0 ~]# echo "options kvm-intel nested=y" > /etc/modprobe.d/nestvirt.conf

After a reboot I can now confirm the kvm_intel moduel is configured for nested virt:

[root@L0 ~]# cat /sys/module/kvm_intel/parameters/nested Y

Next I converted an existing Fedora 20 installation to use "host-passthrough" (see here) so that the L1 guest would see the same processor (with vmx extensions) as my L0 host. To do this i modified the cpu xml tags as follows in the libvirt xml definition:

<cpu mode='host-passthrough'> </cpu>

After powering up the guest I now see that the processor that the L1 guest sees is indeed the same as the host:
[root@L1 ~]# cat /proc/cpuinfo | grep "model name" model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz model name : Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

Next I decided to enable nested virt in the L1 guest by adding the same modprobe.conf file as I did in L0. I did this based on a tip from Kashyap in the #fedora-test-day chat that this tends to give about a 10X performance improvement in the L2 guests.

[root@L1 ~]# echo "options kvm-intel nested=y" > /etc/modprobe.d/nestvirt.conf

After a reboot I could then create and install L2 guests using virt-install and virt-manager. This seemed to work fine except for the fact that I would often see an unknown NMI in the guest periodically.

[ 14.324786] Uhhuh. NMI received for unknown reason 30 on CPU 0. [ 14.325046] Do you have a strange power saving mode enabled? [ 14.325046] Dazed and confused, but trying to continue

I believe the issue I was seeing may be documented in kernel BZ#58941 . After asking about it in the chat I was informed that for the best experience with nested virt I should go to a 3.12 kernel. I decided to leave that exercise for another day :).

Have a great day!

Dusty

BTRFS: How big are my snapshots?

Introduction


I have been using BTRFS snapshots for a while now on my laptop to incrementally save the state of my machine before I perform system updates or run some harebrained test. I quickly ran into a problem though, as on a smaller filesystem I was running out of space. I then wanted to be able to look at each snapshot and easily determine how much space I could recover if I deleted each snapshot. Surprisingly this information was not readily available. Of course you could determine the total size of each snapshot by using du, but that only tells you how big the entire snapshot is and not how much of the snapshot is exclusive to this snapshot only..

Enter filesystem quota and qgroups in git commit 89fe5b5f666c247aa3173745fb87c710f3a71a4a . With quota and qgroups (see an overview here ) we can now see how big each of those snapshots are, including exclusive usage.

Steps


The system I am using for this example is Fedora 19 with btrfs-progs-0.20.rc1.20130308git704a08c-1.fc19.x86_64 installed. I have a 2nd disk attached (/dev/sdb) that I will use for the BTRFS filesystem.

First things first lets create a BTRFS filesystem on sdb, mount the filesystem and then create a .snapshots directory.

[root@localhost ~]# mkfs.btrfs /dev/sdb WARNING! - Btrfs v0.20-rc1 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdb nodesize 4096 leafsize 4096 sectorsize 4096 size 10.00GB Btrfs v0.20-rc1 [root@localhost ~]# [root@localhost ~]# mount /dev/sdb /btrfs [root@localhost ~]# mkdir /btrfs/.snapshots

Next lets copy some files into the filesystem. I will copy in a 50M file and then create a snapshot (snap1). Then I will copy in a 4151M file and take another snapshot (snap2). Finally, a 279M file and another snapshot (snap3).

[root@localhost ~]# cp /root/50M_File /btrfs/ [root@localhost ~]# btrfs subvolume snapshot /btrfs /btrfs/.snapshots/snap1 Create a snapshot of '/btrfs' in '/btrfs/.snapshots/snap1' [root@localhost ~]# [root@localhost ~]# cp /root/4151M_File /btrfs/ [root@localhost ~]# btrfs subvolume snapshot /btrfs /btrfs/.snapshots/snap2 Create a snapshot of '/btrfs' in '/btrfs/.snapshots/snap2' [root@localhost ~]# [root@localhost ~]# cp /root/279M_File /btrfs/ [root@localhost ~]# btrfs subvolume snapshot /btrfs /btrfs/.snapshots/snap3 Create a snapshot of '/btrfs' in '/btrfs/.snapshots/snap3' [root@localhost ~]# [root@localhost ~]# df -kh /btrfs/ Filesystem Size Used Avail Use% Mounted on /dev/sdb 10G 4.4G 3.6G 55% /btrfs

Now how much is each one of those snapshots taking up? We can see this information by enabling quota and then printing out the qgroup information:

[root@localhost ~]# btrfs quota enable /btrfs/ [root@localhost ~]# [root@localhost ~]# btrfs qgroup show /btrfs/ 0/5 4698025984 8192 0/257 52432896 4096 0/263 4405821440 12288 0/264 4698025984 8192

The first number on each line represents the subvolume id. The second number represents the amount of space contained within each subvolume (in bytes) and the last number represents the amount of space that is exclusive to that subvolume (in bytes). Now for some reason when I see such large numbers I go brain dead and fail to comprehend how much space is actually being used. I wrote a little perl script to convert the numbers to MB.

[root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 4480M 0M 0/257 50M 0M 0/263 4201M 0M 0/264 4480M 0M

So that makes sense. The 1st snapshot (denoted by the 2nd line) contains 50M. The 2nd snapshot contains 50M+4151M and the 3rd snapshot contains 50M+4151M+279M. We can also see that at the moment none of them have any exclusive content. This is because all data is shared among them all.

We can fix that by deleting some of the files.

[root@localhost ~]# rm /btrfs/279M_File rm: remove regular file ‘/btrfs/279M_File’? y [root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 4201M 0M 0/257 50M 0M 0/263 4201M 0M 0/264 4480M 278M

Now if we delete all of the files and view the qgroup info, what do we see?

[root@localhost ~]# rm -f /btrfs/4151M_File [root@localhost ~]# rm -f /btrfs/50M_File [root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 0M 0M 0/257 50M 0M 0/263 4201M 0M 0/264 4480M 278M

We can see from the first line that the files have been removed from the root subvolume but the exclusive counts didn't go up for snap1 and snap2?

This is because the files are shared with snap3. If we remove snap3 then we'll see the exclusive number go up for snap2:

[root@localhost ~]# btrfs subvolume delete /btrfs/.snapshots/snap3 Delete subvolume '/btrfs/.snapshots/snap3' [root@localhost ~]# [root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 -4480M -278M 0/257 50M 0M 0/263 4201M 4151M 0/264 4480M 278M

As expected the 2nd snapshot now shows 4151M as exclusive. However, unexpectedly the qgroup for the 3rd snapshot still exists and the root subvolume qgroup now shows negative numbers.

Finally lets delete snap2 and observe that the amount of exclusive space (4151M) is actually released back to the pool of free space:

[root@localhost ~]# df -kh /btrfs/ Filesystem Size Used Avail Use% Mounted on /dev/sdb 10G 4.2G 3.9G 52% /btrfs [root@localhost ~]# [root@localhost ~]# btrfs subvolume delete /btrfs/.snapshots/snap2 Delete subvolume '/btrfs/.snapshots/snap2' [root@localhost ~]# [root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 -8682M -4430M 0/257 50M 50M 0/263 4201M 4151M 0/264 4480M 278M [root@localhost ~]# [root@localhost ~]# df -kh /btrfs/ Filesystem Size Used Avail Use% Mounted on /dev/sdb 10G 52M 8.0G 1% /btrfs

So we can see that the space is in fact released and is now counted as free space. Again the negative numbers and the fact that the qgroups show up for the deleted subvolumes is a bit odd.

Cheers!

Dusty Mabe

Bonus: It seems like there is a patch floating around to enhance the output of qgroup show. Check it out here .

Excellent LVM Tutorial for Beginners or Experts


I ran across a great PDF from this year's Red Hat Summit in Boston. Hosted by Christoph Doerbech and Jonathan Brassow the lab covers the following topics:
  • What is LVM? What are filesystems? etc..
  • Creating PVs, VGs, LVs.
  • LVM Striping and Mirroring.
  • LVM Raid.
  • LVM Snapshots (and reverting).
  • LVM Sparse Volumes (a snapshot of /dev/zero).
  • LVM Thin LVs and new snapshots.
Check out the PDF here . If that link ceases to work at some point I have it hosted here as well.

Hope everyone can use this as a great learning tool!

Dusty

Convert an Existing System to Use Thin LVs

Introduction


Want to take advantage of the efficiency and improved snapshotting of thin LVs on an existing system? It will take a little work but it is possible. The following steps will show how to convert a CentOS 6.4 basic installation to use thin logical volumes for the root device (containing the root filesystem).

Preparation


To kick things off there are few preparation steps we need that seem a bit unreleated but will prove useful. First I enabled LVM to issue discards to underlying block devices (if you are interested in why this is needed you can check out my post here. )

[root@Cent64 ~]# cat /etc/lvm/lvm.conf | grep issue_discards issue_discards = 0 [root@Cent64 ~]# sed -i -e 's/issue_discards = 0/issue_discards = 1/' /etc/lvm/lvm.conf [root@Cent64 ~]# cat /etc/lvm/lvm.conf | grep issue_discards issue_discards = 1

Next, since we are converting the whole system to use thin LVs we need to enable our initramfs to mount and switch root to a thin LV. By default dracut does not include the utilities that are needed to do this (see BZ#921235 ). This means we need to tell dracut to add thin_dump, thin_restore, and thin_check (provided by the device-mapper-persistent-data rpm) to the initramfs. We also want to make sure they get added for any future initramfs building so we will add it to a file within /usr/share/dracut/modules.d/.

[root@Cent64 ~]# mkdir /usr/share/dracut/modules.d/99thinlvm [root@Cent64 ~]# cat << EOF > /usr/share/dracut/modules.d/99thinlvm/install > #!/bin/bash > dracut_install -o thin_dump thin_restore thin_check > EOF [root@Cent64 ~]# chmod +x /usr/share/dracut/modules.d/99thinlvm/install [root@Cent64 ~]# dracut --force [root@Cent64 ~]# lsinitrd /boot/initramfs-2.6.32-358.el6.x86_64.img | grep thin_ -rwxr-xr-x 1 root root 351816 Sep 3 23:11 usr/sbin/thin_dump -rwxr-xr-x 1 root root 238072 Sep 3 23:11 usr/sbin/thin_check -rwxr-xr-x 1 root root 355968 Sep 3 23:11 usr/sbin/thin_restore

OK, so now that we have an adequate initramfs the final step before the conversion is to make sure there is enough free space in the VG to move our data around (in the worst case scenario we will need twice the space we are currently using). On my system I just added a 2nd disk (sdb) and added that disk to the VG:

[root@Cent64 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom sdb 8:16 0 31G 0 disk sda 8:0 0 30G 0 disk ├─sda1 8:1 0 500M 0 part /boot └─sda2 8:2 0 29.5G 0 part ├─vg_cent64-lv_root (dm-0) 253:0 0 25.6G 0 lvm / └─vg_cent64-lv_swap (dm-1) 253:1 0 4G 0 lvm [SWAP] [root@Cent64 ~]# [root@Cent64 ~]# vgextend vg_cent64 /dev/sdb Volume group "vg_cent64" successfully extended [root@Cent64 ~]# [root@Cent64 ~]# vgs VG #PV #LV #SN Attr VSize VFree vg_cent64 2 2 0 wz--n- 60.50g 31.00g

Conversion


Now comes the main event! We need to create a thin LV pool and then move the root LV over to the pool. Since thin pools currently cannot be reduced in size ( BZ#812731 ) I decided to make my thin pool be exactly the size of the LV I wanted to put in the pool. Below I show creating the thin pool as well as the thin_root that will be our new "thin" root logical volume.

[root@Cent64 ~]# lvs --units=b /dev/vg_cent64/lv_root LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_root vg_cent64 -wi-ao--- 27455913984B [root@Cent64 ~]# [root@Cent64 ~]# lvcreate -T vg_cent64/thinp --size=27455913984B Logical volume "thinp" created [root@Cent64 ~]# [root@Cent64 ~]# lvcreate -T vg_cent64/thinp -n thin_root -V 27455913984B Logical volume "thin_root" created [root@Cent64 ~]# [root@Cent64 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_root vg_cent64 -wi-ao--- 25.57g lv_swap vg_cent64 -wi-ao--- 3.94g thin_root vg_cent64 Vwi-a-tz- 25.57g thinp 0.00 thinp vg_cent64 twi-a-tz- 25.57g 0.00

Now we need to get all of the data from lv_root and into thin_root. My original thought is just to dd all of the content from one to the other, but there is one problem: we are still mounted on lv_root. For safety I would probably recommend booting into a rescue mode from a cd and then doing the dd without either filesystem mounted. However, today I just decided to make an LVM snapshot of the root LV which gives us a consistent view of the block device for the duration of the copy.

[root@Cent64 ~]# lvcreate --snapshot -n snap_root --size=2g vg_cent64/lv_root Logical volume "snap_root" created [root@Cent64 ~]# [root@Cent64 ~]# dd if=/dev/vg_cent64/snap_root of=/dev/vg_cent64/thin_root 53624832+0 records in 53624832+0 records out 27455913984 bytes (27 GB) copied, 597.854 s, 45.9 MB/s [root@Cent64 ~]# [root@Cent64 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_root vg_cent64 owi-aos-- 25.57g lv_swap vg_cent64 -wi-ao--- 3.94g snap_root vg_cent64 swi-a-s-- 2.00g lv_root 0.07 thin_root vg_cent64 Vwi-a-tz- 25.57g thinp 100.00 thinp vg_cent64 twi-a-tz- 25.57g 100.00 [root@Cent64 ~]# [root@Cent64 ~]# lvremove /dev/vg_cent64/snap_root Do you really want to remove active logical volume snap_root? [y/n]: y Logical volume "snap_root" successfully removed

So there we have it. All of the data has been copied to the thin_root LV. You can see from the output of lvs that the thin LV and the thin pool are both 100% full. 100% full? really? I thought these were "thin" LVs. :)

Let's recover that space! I'll do this by mounting thin_root and then running fstrim to release the unused blocks back to the pool. First I check the fs and clean up any dirt by running fsck.

[root@Cent64 ~]# fsck /dev/vg_cent64/thin_root fsck from util-linux-ng 2.17.2 e2fsck 1.41.12 (17-May-2010) Clearing orphaned inode 1047627 (uid=0, gid=0, mode=0100700, size=0) Clearing orphaned inode 1182865 (uid=0, gid=0, mode=0100755, size=15296) Clearing orphaned inode 1182869 (uid=0, gid=0, mode=0100755, size=24744) Clearing orphaned inode 1444589 (uid=0, gid=0, mode=0100755, size=15256) ... /dev/mapper/vg_cent64-thin_root: clean, 30776/1676080 files, 340024/6703104 blocks [root@Cent64 ~]# [root@Cent64 ~]# mount /dev/vg_cent64/thin_root /mnt/ [root@Cent64 ~]# [root@Cent64 ~]# fstrim -v /mnt/ /mnt/: 26058436608 bytes were trimmed [root@Cent64 ~]# [root@Cent64 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_root vg_cent64 -wi-ao--- 25.57g lv_swap vg_cent64 -wi-ao--- 3.94g thin_root vg_cent64 Vwi-aotz- 25.57g thinp 5.13 thinp vg_cent64 twi-a-tz- 25.57g 5.13

Success! All the way from 100% back down to 5%.

Now let's update the grub.conf and the fstab to use the new thin_root LV.

NOTE: grub.conf is on the filesystem on sda1.
NOTE: fstab is on the filesystem on thin_root.

[root@Cent64 ~]# sed -i -e 's/lv_root/thin_root/g' /boot/grub/grub.conf [root@Cent64 ~]# sed -i -e 's/lv_root/thin_root/g' /mnt/etc/fstab [root@Cent64 ~]# umount /mnt/

Time for a reboot!

After the system comes back up we should now be able to delete the original lv_root.

[root@Cent64 ~]# lvremove /dev/vg_cent64/lv_root Do you really want to remove active logical volume lv_root? [y/n]: y Logical volume "lv_root" successfully removed

Now we want to remove that extra disk (/dev/sdb) I added. However there is a subtle difference between my system now and my system before. There is metadata LV (thinp_tmeta) that is taking up a minute amount of space that is preventing us from being able to fit completely on the first disk (/dev/sda).

No biggie. We'll just steal this amount of space from lv_swap. And then run pvmove to move all data back to /dev/sda.

[root@Cent64 ~]# lvs -a --units=b LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_swap vg_cent64 -wi-ao--- 4227858432B thin_root vg_cent64 Vwi-aotz- 27455913984B thinp 5.13 thinp vg_cent64 twi-a-tz- 27455913984B 5.13 [thinp_tdata] vg_cent64 Twi-aot-- 27455913984B [thinp_tmeta] vg_cent64 ewi-aot-- 29360128B [root@Cent64 ~]# [root@Cent64 ~]# swapoff /dev/vg_cent64/lv_swap [root@Cent64 ~]# [root@Cent64 ~]# lvresize --size=-29360128B /dev/vg_cent64/lv_swap WARNING: Reducing active logical volume to 3.91 GiB THIS MAY DESTROY YOUR DATA (filesystem etc.) Do you really want to reduce lv_swap? [y/n]: y Reducing logical volume lv_swap to 3.91 GiB Logical volume lv_swap successfully resized [root@Cent64 ~]# [root@Cent64 ~]# mkswap /dev/vg_cent64/lv_swap mkswap: /dev/vg_cent64/lv_swap: warning: don't erase bootbits sectors on whole disk. Use -f to force. Setting up swapspace version 1, size = 4100092 KiB no label, UUID=7b023342-a9a9-4676-8bc6-1e60541010e4 [root@Cent64 ~]# [root@Cent64 ~]# swapon -v /dev/vg_cent64/lv_swap swapon on /dev/vg_cent64/lv_swap swapon: /dev/mapper/vg_cent64-lv_swap: found swap signature: version 1, page-size 4, same byte order swapon: /dev/mapper/vg_cent64-lv_swap: pagesize=4096, swapsize=4198498304, devsize=4198498304

Now we can get rid of sdb by running pvmove and vgreduce.

[root@Cent64 ~]# pvmove /dev/sdb /dev/sdb: Moved: 0.1% /dev/sdb: Moved: 11.8% /dev/sdb: Moved: 21.0% /dev/sdb: Moved: 32.0% /dev/sdb: Moved: 45.6% /dev/sdb: Moved: 56.2% /dev/sdb: Moved: 68.7% /dev/sdb: Moved: 79.6% /dev/sdb: Moved: 90.7% /dev/sdb: Moved: 100.0% [root@Cent64 ~]# [root@Cent64 ~]# pvs PV VG Fmt Attr PSize PFree /dev/sda2 vg_cent64 lvm2 a-- 29.51g 0 /dev/sdb vg_cent64 lvm2 a-- 31.00g 31.00g [root@Cent64 ~]# [root@Cent64 ~]# vgreduce vg_cent64 /dev/sdb Removed "/dev/sdb" from volume group "vg_cent64"

Boom! You're done!

Dusty