Monthly Archive for September, 2013

BTRFS: How big are my snapshots?

Introduction


I have been using BTRFS snapshots for a while now on my laptop to incrementally save the state of my machine before I perform system updates or run some harebrained test. I quickly ran into a problem though, as on a smaller filesystem I was running out of space. I then wanted to be able to look at each snapshot and easily determine how much space I could recover if I deleted each snapshot. Surprisingly this information was not readily available. Of course you could determine the total size of each snapshot by using du, but that only tells you how big the entire snapshot is and not how much of the snapshot is exclusive to this snapshot only..

Enter filesystem quota and qgroups in git commit 89fe5b5f666c247aa3173745fb87c710f3a71a4a . With quota and qgroups (see an overview here ) we can now see how big each of those snapshots are, including exclusive usage.

Steps


The system I am using for this example is Fedora 19 with btrfs-progs-0.20.rc1.20130308git704a08c-1.fc19.x86_64 installed. I have a 2nd disk attached (/dev/sdb) that I will use for the BTRFS filesystem.

First things first lets create a BTRFS filesystem on sdb, mount the filesystem and then create a .snapshots directory.

[root@localhost ~]# mkfs.btrfs /dev/sdb WARNING! - Btrfs v0.20-rc1 IS EXPERIMENTAL WARNING! - see http://btrfs.wiki.kernel.org before using fs created label (null) on /dev/sdb nodesize 4096 leafsize 4096 sectorsize 4096 size 10.00GB Btrfs v0.20-rc1 [root@localhost ~]# [root@localhost ~]# mount /dev/sdb /btrfs [root@localhost ~]# mkdir /btrfs/.snapshots

Next lets copy some files into the filesystem. I will copy in a 50M file and then create a snapshot (snap1). Then I will copy in a 4151M file and take another snapshot (snap2). Finally, a 279M file and another snapshot (snap3).

[root@localhost ~]# cp /root/50M_File /btrfs/ [root@localhost ~]# btrfs subvolume snapshot /btrfs /btrfs/.snapshots/snap1 Create a snapshot of '/btrfs' in '/btrfs/.snapshots/snap1' [root@localhost ~]# [root@localhost ~]# cp /root/4151M_File /btrfs/ [root@localhost ~]# btrfs subvolume snapshot /btrfs /btrfs/.snapshots/snap2 Create a snapshot of '/btrfs' in '/btrfs/.snapshots/snap2' [root@localhost ~]# [root@localhost ~]# cp /root/279M_File /btrfs/ [root@localhost ~]# btrfs subvolume snapshot /btrfs /btrfs/.snapshots/snap3 Create a snapshot of '/btrfs' in '/btrfs/.snapshots/snap3' [root@localhost ~]# [root@localhost ~]# df -kh /btrfs/ Filesystem Size Used Avail Use% Mounted on /dev/sdb 10G 4.4G 3.6G 55% /btrfs

Now how much is each one of those snapshots taking up? We can see this information by enabling quota and then printing out the qgroup information:

[root@localhost ~]# btrfs quota enable /btrfs/ [root@localhost ~]# [root@localhost ~]# btrfs qgroup show /btrfs/ 0/5 4698025984 8192 0/257 52432896 4096 0/263 4405821440 12288 0/264 4698025984 8192

The first number on each line represents the subvolume id. The second number represents the amount of space contained within each subvolume (in bytes) and the last number represents the amount of space that is exclusive to that subvolume (in bytes). Now for some reason when I see such large numbers I go brain dead and fail to comprehend how much space is actually being used. I wrote a little perl script to convert the numbers to MB.

[root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 4480M 0M 0/257 50M 0M 0/263 4201M 0M 0/264 4480M 0M

So that makes sense. The 1st snapshot (denoted by the 2nd line) contains 50M. The 2nd snapshot contains 50M+4151M and the 3rd snapshot contains 50M+4151M+279M. We can also see that at the moment none of them have any exclusive content. This is because all data is shared among them all.

We can fix that by deleting some of the files.

[root@localhost ~]# rm /btrfs/279M_File rm: remove regular file ‘/btrfs/279M_File’? y [root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 4201M 0M 0/257 50M 0M 0/263 4201M 0M 0/264 4480M 278M

Now if we delete all of the files and view the qgroup info, what do we see?

[root@localhost ~]# rm -f /btrfs/4151M_File [root@localhost ~]# rm -f /btrfs/50M_File [root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 0M 0M 0/257 50M 0M 0/263 4201M 0M 0/264 4480M 278M

We can see from the first line that the files have been removed from the root subvolume but the exclusive counts didn't go up for snap1 and snap2?

This is because the files are shared with snap3. If we remove snap3 then we'll see the exclusive number go up for snap2:

[root@localhost ~]# btrfs subvolume delete /btrfs/.snapshots/snap3 Delete subvolume '/btrfs/.snapshots/snap3' [root@localhost ~]# [root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 -4480M -278M 0/257 50M 0M 0/263 4201M 4151M 0/264 4480M 278M

As expected the 2nd snapshot now shows 4151M as exclusive. However, unexpectedly the qgroup for the 3rd snapshot still exists and the root subvolume qgroup now shows negative numbers.

Finally lets delete snap2 and observe that the amount of exclusive space (4151M) is actually released back to the pool of free space:

[root@localhost ~]# df -kh /btrfs/ Filesystem Size Used Avail Use% Mounted on /dev/sdb 10G 4.2G 3.9G 52% /btrfs [root@localhost ~]# [root@localhost ~]# btrfs subvolume delete /btrfs/.snapshots/snap2 Delete subvolume '/btrfs/.snapshots/snap2' [root@localhost ~]# [root@localhost ~]# btrfs qgroup show /btrfs/ | /root/convert 0/5 -8682M -4430M 0/257 50M 50M 0/263 4201M 4151M 0/264 4480M 278M [root@localhost ~]# [root@localhost ~]# df -kh /btrfs/ Filesystem Size Used Avail Use% Mounted on /dev/sdb 10G 52M 8.0G 1% /btrfs

So we can see that the space is in fact released and is now counted as free space. Again the negative numbers and the fact that the qgroups show up for the deleted subvolumes is a bit odd.

Cheers!

Dusty Mabe

Bonus: It seems like there is a patch floating around to enhance the output of qgroup show. Check it out here .

Excellent LVM Tutorial for Beginners or Experts


I ran across a great PDF from this year's Red Hat Summit in Boston. Hosted by Christoph Doerbech and Jonathan Brassow the lab covers the following topics:
  • What is LVM? What are filesystems? etc..
  • Creating PVs, VGs, LVs.
  • LVM Striping and Mirroring.
  • LVM Raid.
  • LVM Snapshots (and reverting).
  • LVM Sparse Volumes (a snapshot of /dev/zero).
  • LVM Thin LVs and new snapshots.
Check out the PDF here . If that link ceases to work at some point I have it hosted here as well.

Hope everyone can use this as a great learning tool!

Dusty

Convert an Existing System to Use Thin LVs

Introduction


Want to take advantage of the efficiency and improved snapshotting of thin LVs on an existing system? It will take a little work but it is possible. The following steps will show how to convert a CentOS 6.4 basic installation to use thin logical volumes for the root device (containing the root filesystem).

Preparation


To kick things off there are few preparation steps we need that seem a bit unreleated but will prove useful. First I enabled LVM to issue discards to underlying block devices (if you are interested in why this is needed you can check out my post here. )

[root@Cent64 ~]# cat /etc/lvm/lvm.conf | grep issue_discards issue_discards = 0 [root@Cent64 ~]# sed -i -e 's/issue_discards = 0/issue_discards = 1/' /etc/lvm/lvm.conf [root@Cent64 ~]# cat /etc/lvm/lvm.conf | grep issue_discards issue_discards = 1

Next, since we are converting the whole system to use thin LVs we need to enable our initramfs to mount and switch root to a thin LV. By default dracut does not include the utilities that are needed to do this (see BZ#921235 ). This means we need to tell dracut to add thin_dump, thin_restore, and thin_check (provided by the device-mapper-persistent-data rpm) to the initramfs. We also want to make sure they get added for any future initramfs building so we will add it to a file within /usr/share/dracut/modules.d/.

[root@Cent64 ~]# mkdir /usr/share/dracut/modules.d/99thinlvm [root@Cent64 ~]# cat << EOF > /usr/share/dracut/modules.d/99thinlvm/install > #!/bin/bash > dracut_install -o thin_dump thin_restore thin_check > EOF [root@Cent64 ~]# chmod +x /usr/share/dracut/modules.d/99thinlvm/install [root@Cent64 ~]# dracut --force [root@Cent64 ~]# lsinitrd /boot/initramfs-2.6.32-358.el6.x86_64.img | grep thin_ -rwxr-xr-x 1 root root 351816 Sep 3 23:11 usr/sbin/thin_dump -rwxr-xr-x 1 root root 238072 Sep 3 23:11 usr/sbin/thin_check -rwxr-xr-x 1 root root 355968 Sep 3 23:11 usr/sbin/thin_restore

OK, so now that we have an adequate initramfs the final step before the conversion is to make sure there is enough free space in the VG to move our data around (in the worst case scenario we will need twice the space we are currently using). On my system I just added a 2nd disk (sdb) and added that disk to the VG:

[root@Cent64 ~]# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sr0 11:0 1 1024M 0 rom sdb 8:16 0 31G 0 disk sda 8:0 0 30G 0 disk ├─sda1 8:1 0 500M 0 part /boot └─sda2 8:2 0 29.5G 0 part ├─vg_cent64-lv_root (dm-0) 253:0 0 25.6G 0 lvm / └─vg_cent64-lv_swap (dm-1) 253:1 0 4G 0 lvm [SWAP] [root@Cent64 ~]# [root@Cent64 ~]# vgextend vg_cent64 /dev/sdb Volume group "vg_cent64" successfully extended [root@Cent64 ~]# [root@Cent64 ~]# vgs VG #PV #LV #SN Attr VSize VFree vg_cent64 2 2 0 wz--n- 60.50g 31.00g

Conversion


Now comes the main event! We need to create a thin LV pool and then move the root LV over to the pool. Since thin pools currently cannot be reduced in size ( BZ#812731 ) I decided to make my thin pool be exactly the size of the LV I wanted to put in the pool. Below I show creating the thin pool as well as the thin_root that will be our new "thin" root logical volume.

[root@Cent64 ~]# lvs --units=b /dev/vg_cent64/lv_root LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_root vg_cent64 -wi-ao--- 27455913984B [root@Cent64 ~]# [root@Cent64 ~]# lvcreate -T vg_cent64/thinp --size=27455913984B Logical volume "thinp" created [root@Cent64 ~]# [root@Cent64 ~]# lvcreate -T vg_cent64/thinp -n thin_root -V 27455913984B Logical volume "thin_root" created [root@Cent64 ~]# [root@Cent64 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_root vg_cent64 -wi-ao--- 25.57g lv_swap vg_cent64 -wi-ao--- 3.94g thin_root vg_cent64 Vwi-a-tz- 25.57g thinp 0.00 thinp vg_cent64 twi-a-tz- 25.57g 0.00

Now we need to get all of the data from lv_root and into thin_root. My original thought is just to dd all of the content from one to the other, but there is one problem: we are still mounted on lv_root. For safety I would probably recommend booting into a rescue mode from a cd and then doing the dd without either filesystem mounted. However, today I just decided to make an LVM snapshot of the root LV which gives us a consistent view of the block device for the duration of the copy.

[root@Cent64 ~]# lvcreate --snapshot -n snap_root --size=2g vg_cent64/lv_root Logical volume "snap_root" created [root@Cent64 ~]# [root@Cent64 ~]# dd if=/dev/vg_cent64/snap_root of=/dev/vg_cent64/thin_root 53624832+0 records in 53624832+0 records out 27455913984 bytes (27 GB) copied, 597.854 s, 45.9 MB/s [root@Cent64 ~]# [root@Cent64 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_root vg_cent64 owi-aos-- 25.57g lv_swap vg_cent64 -wi-ao--- 3.94g snap_root vg_cent64 swi-a-s-- 2.00g lv_root 0.07 thin_root vg_cent64 Vwi-a-tz- 25.57g thinp 100.00 thinp vg_cent64 twi-a-tz- 25.57g 100.00 [root@Cent64 ~]# [root@Cent64 ~]# lvremove /dev/vg_cent64/snap_root Do you really want to remove active logical volume snap_root? [y/n]: y Logical volume "snap_root" successfully removed

So there we have it. All of the data has been copied to the thin_root LV. You can see from the output of lvs that the thin LV and the thin pool are both 100% full. 100% full? really? I thought these were "thin" LVs. :)

Let's recover that space! I'll do this by mounting thin_root and then running fstrim to release the unused blocks back to the pool. First I check the fs and clean up any dirt by running fsck.

[root@Cent64 ~]# fsck /dev/vg_cent64/thin_root fsck from util-linux-ng 2.17.2 e2fsck 1.41.12 (17-May-2010) Clearing orphaned inode 1047627 (uid=0, gid=0, mode=0100700, size=0) Clearing orphaned inode 1182865 (uid=0, gid=0, mode=0100755, size=15296) Clearing orphaned inode 1182869 (uid=0, gid=0, mode=0100755, size=24744) Clearing orphaned inode 1444589 (uid=0, gid=0, mode=0100755, size=15256) ... /dev/mapper/vg_cent64-thin_root: clean, 30776/1676080 files, 340024/6703104 blocks [root@Cent64 ~]# [root@Cent64 ~]# mount /dev/vg_cent64/thin_root /mnt/ [root@Cent64 ~]# [root@Cent64 ~]# fstrim -v /mnt/ /mnt/: 26058436608 bytes were trimmed [root@Cent64 ~]# [root@Cent64 ~]# lvs LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_root vg_cent64 -wi-ao--- 25.57g lv_swap vg_cent64 -wi-ao--- 3.94g thin_root vg_cent64 Vwi-aotz- 25.57g thinp 5.13 thinp vg_cent64 twi-a-tz- 25.57g 5.13

Success! All the way from 100% back down to 5%.

Now let's update the grub.conf and the fstab to use the new thin_root LV.

NOTE: grub.conf is on the filesystem on sda1.
NOTE: fstab is on the filesystem on thin_root.

[root@Cent64 ~]# sed -i -e 's/lv_root/thin_root/g' /boot/grub/grub.conf [root@Cent64 ~]# sed -i -e 's/lv_root/thin_root/g' /mnt/etc/fstab [root@Cent64 ~]# umount /mnt/

Time for a reboot!

After the system comes back up we should now be able to delete the original lv_root.

[root@Cent64 ~]# lvremove /dev/vg_cent64/lv_root Do you really want to remove active logical volume lv_root? [y/n]: y Logical volume "lv_root" successfully removed

Now we want to remove that extra disk (/dev/sdb) I added. However there is a subtle difference between my system now and my system before. There is metadata LV (thinp_tmeta) that is taking up a minute amount of space that is preventing us from being able to fit completely on the first disk (/dev/sda).

No biggie. We'll just steal this amount of space from lv_swap. And then run pvmove to move all data back to /dev/sda.

[root@Cent64 ~]# lvs -a --units=b LV VG Attr LSize Pool Origin Data% Move Log Cpy%Sync Convert lv_swap vg_cent64 -wi-ao--- 4227858432B thin_root vg_cent64 Vwi-aotz- 27455913984B thinp 5.13 thinp vg_cent64 twi-a-tz- 27455913984B 5.13 [thinp_tdata] vg_cent64 Twi-aot-- 27455913984B [thinp_tmeta] vg_cent64 ewi-aot-- 29360128B [root@Cent64 ~]# [root@Cent64 ~]# swapoff /dev/vg_cent64/lv_swap [root@Cent64 ~]# [root@Cent64 ~]# lvresize --size=-29360128B /dev/vg_cent64/lv_swap WARNING: Reducing active logical volume to 3.91 GiB THIS MAY DESTROY YOUR DATA (filesystem etc.) Do you really want to reduce lv_swap? [y/n]: y Reducing logical volume lv_swap to 3.91 GiB Logical volume lv_swap successfully resized [root@Cent64 ~]# [root@Cent64 ~]# mkswap /dev/vg_cent64/lv_swap mkswap: /dev/vg_cent64/lv_swap: warning: don't erase bootbits sectors on whole disk. Use -f to force. Setting up swapspace version 1, size = 4100092 KiB no label, UUID=7b023342-a9a9-4676-8bc6-1e60541010e4 [root@Cent64 ~]# [root@Cent64 ~]# swapon -v /dev/vg_cent64/lv_swap swapon on /dev/vg_cent64/lv_swap swapon: /dev/mapper/vg_cent64-lv_swap: found swap signature: version 1, page-size 4, same byte order swapon: /dev/mapper/vg_cent64-lv_swap: pagesize=4096, swapsize=4198498304, devsize=4198498304

Now we can get rid of sdb by running pvmove and vgreduce.

[root@Cent64 ~]# pvmove /dev/sdb /dev/sdb: Moved: 0.1% /dev/sdb: Moved: 11.8% /dev/sdb: Moved: 21.0% /dev/sdb: Moved: 32.0% /dev/sdb: Moved: 45.6% /dev/sdb: Moved: 56.2% /dev/sdb: Moved: 68.7% /dev/sdb: Moved: 79.6% /dev/sdb: Moved: 90.7% /dev/sdb: Moved: 100.0% [root@Cent64 ~]# [root@Cent64 ~]# pvs PV VG Fmt Attr PSize PFree /dev/sda2 vg_cent64 lvm2 a-- 29.51g 0 /dev/sdb vg_cent64 lvm2 a-- 31.00g 31.00g [root@Cent64 ~]# [root@Cent64 ~]# vgreduce vg_cent64 /dev/sdb Removed "/dev/sdb" from volume group "vg_cent64"

Boom! You're done!

Dusty