Hard Drive Monitoring and E-mail Alerts Using smartd

A while back I set up mdadm to monitor my RAID array and send email alerts to notify me of failures. At the same time I also set up smartd (see S.M.A.R.T. ) to monitor the hard drives themselves and to send me email alerts.

To do this you edit the /etc/smartd.conf file. After I was done my /etc/smartd.conf file looked like the following:

#
# HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE.
# PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS
#
#   -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N
#   -T TYPE set the tolerance to one of: normal, permissive
#   -o VAL  Enable/disable automatic offline tests (on/off)
#   -S VAL  Enable/disable attribute autosave (on/off)
#   -n MODE No check. MODE is one of: never, sleep, standby, idle
#   -H      Monitor SMART Health Status, report if failed
#   -l TYPE Monitor SMART log.  Type is one of: error, selftest
#   -f      Monitor for failure of any 'Usage' Attributes
#   -m ADD  Send warning email to ADD for -H, -l error, -l selftest, and -f
#   -M TYPE Modify email warning behavior (see man page)
#   -s REGE Start self-test when type/date matches regular expression (see man page)
#   -p      Report changes in 'Prefailure' Normalized Attributes
#   -u      Report changes in 'Usage' Normalized Attributes
#   -t      Equivalent to -p and -u Directives
#   -r ID   Also report Raw values of Attribute ID with -p, -u or -t
#   -R ID   Track changes in Attribute ID Raw value with -p, -u or -t
#   -i ID   Ignore Attribute ID for -f Directive
#   -I ID   Ignore Attribute ID for -p, -u or -t Directive
#   -C ID   Report if Current Pending Sector count non-zero
#   -U ID   Report if Offline Uncorrectable count non-zero
#   -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit
#   -v N,ST Modifies labeling of Attribute N (see man page)
#   -a      Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198
#   -F TYPE Use firmware bug workaround. Type is one of: none, samsung
#   -P TYPE Drive-specific presets: use, ignore, show, showall
#    #      Comment: text after a hash sign is ignored
#    \      Line continuation character
# Attribute ID is a decimal integer 1 <= ID <= 255
# except for -C and -U, where ID = 0 turns them off.
# All but -d, -m and -M Directives are only implemented for ATA devices
#
# If the test string DEVICESCAN is the first uncommented text
# then smartd will scan for devices /dev/hd[a-l] and /dev/sd[a-z]
DEVICESCAN -o on -H -l error -l selftest -t -m dustymabe@gmail.com -M test


It is pretty much all comments except for the last line. You can see from the comments what each option on the last line means. To summarize I am telling smartd to:

“Monitor the health status as well as the error and selftest logs of all /dev/hd[a-l] and /dev/sd[a-z] devices that are discovered to have SMART capabilities. Report any errors/failures as well as startup test messages to dustymabe@gmail.com.”

Now just make sure the smartd service is configured to run by default and your disks should be monitored! You can check this by looking to see if you get an email when smartd starts (make sure to check your spam filter).

Dusty Mabe

Automatically Extend LVM Snapshots

Snapshot logical volumes are a great way to save the state of an LV (a special block device) at a particular point in time. Essentially this provides the ability to snapshot block devices and then revert them back at a later date. In other words you can rest easy when that big upgrade comes along :)

This all seems fine and dandy until your snapshot runs out of space! Yep, the size of the snapshot does matter. Snapshot LVs are Copy-On-Write (COW) devices. Old blocks from the origin LV get “Copied” to the snapshot LV only when new blocks are “Written” to in the origin LV. Additionally, only the blocks that get written to in the origin LV get copied over to the snapshot LV.

Thus, you can make a snapshot LV much smaller than the origin LV and as long as the snapshot never fills up then you are fine. If it does fill up, then the snapshot is invalid and you can no longer use it.

The problem with this is the fact that it becomes quite tricky to determine how much space you actually need in your snapshot. If you notice that your snapshot is becoming full then you can use lvextend to increase the size of the snapshot, but this is not very desirable as it’s not automated and requires user intervention.

The good news is that recently there was an addition to lvm that allows for autoextension of snapshot LVs! The bugzilla report #427298 tracked the request and it has now been released in lvm2-2.02.84-1. The lvm-devel email from when the patch came through contains some good details on how to use the new functionality.

To summarize, you edit /etc/lvm/lvm.conf and set the snapshot_autoextend_threshold to something other than 100 (100 is the default value and also disables automatic extension). In addition, you also edit the snapshot_autoextend_percent. This value will be the amount you want to extend the snapshot LV.

To test this out I edited my /etc/lvm/lvm.conf file to have the following values:
\

[Read More]

Monitor RAID Arrays and Get E-mail Alerts Using mdadm


In my Desktop computer I use a software RAID1 to protect against a data loss due to a hard drive failure. I have two hard drives, each with four identically sized partitions. Partition 1 on disk A is mirrored with partition 1 on disk B. Together they create the “multiple-device” device node md1 which can then be treated like any block device. Partitions 2, 3, 4 on the disks make up md2, md3, and md4 respectively.

You can use mdadm to configure a software raid in Linux. To see the status of the raid you can view the contents of the /proc/mdstat file. For my software raid the contents of the file should look like:
\

[Read More]

Recover Space By Finding Deleted Files That Are Still Held Open.


The other day I was trying to clean out some space on an almost full filesystem that I use to hold some video files. The output from df looked like:
\

media:~ # df -kh /docs/videos/
Filesystem                  Size  Used Avail Use% Mounted on
/dev/mapper/vgvolume-videos 5.0G  4.2G  526M  90% /docs/videos


I then found the largest file I wanted to delete (a 700M avi video I had recently watched), and removed it. df should now report that I freed up some space right? NOPE!
\

[Read More]

Create a screencast of a terminal session using scriptreplay.


I recently ran into an issue where I needed to demo a project without actually being present for the demo. I thought about recording (into some video format) a screencast of my terminal window and then having my audience play it at the time of the demo. This would have worked just fine, but, as I was browsing the internet searching for exactly how to record a screencast of this nature, I ran across a blog post talking about how to play back terminal sessions using the output of the script program. This piqued my interest for several reasons:

[Read More]

Hi Planet! - SSH: Disable checking host key against known_hosts file.

Hi Everyone! Since this is my first post it is going to be short and sweet. :)

I work on a daily basis with Linux Servers that must be installed, configured, re-installed, configured etc… Over and over, develop and test. Our primary means of communication with these servers is through ssh. Every time a server is re-installed it generates a new ssh key and thus you will always get a “Man in the Middle Attack” warning from SSH like:
\

[Read More]