A while back I set up mdadm
to monitor my RAID array and send email
alerts to notify me of failures. At the same time I also set up smartd
(see S.M.A.R.T. ) to monitor
the hard drives themselves and to send me email alerts.
To do this you edit the /etc/smartd.conf
file. After I was done my
/etc/smartd.conf
file looked like the following:
#
# HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE.
# PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS
#
# -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N
# -T TYPE set the tolerance to one of: normal, permissive
# -o VAL Enable/disable automatic offline tests (on/off)
# -S VAL Enable/disable attribute autosave (on/off)
# -n MODE No check. MODE is one of: never, sleep, standby, idle
# -H Monitor SMART Health Status, report if failed
# -l TYPE Monitor SMART log. Type is one of: error, selftest
# -f Monitor for failure of any 'Usage' Attributes
# -m ADD Send warning email to ADD for -H, -l error, -l selftest, and -f
# -M TYPE Modify email warning behavior (see man page)
# -s REGE Start self-test when type/date matches regular expression (see man page)
# -p Report changes in 'Prefailure' Normalized Attributes
# -u Report changes in 'Usage' Normalized Attributes
# -t Equivalent to -p and -u Directives
# -r ID Also report Raw values of Attribute ID with -p, -u or -t
# -R ID Track changes in Attribute ID Raw value with -p, -u or -t
# -i ID Ignore Attribute ID for -f Directive
# -I ID Ignore Attribute ID for -p, -u or -t Directive
# -C ID Report if Current Pending Sector count non-zero
# -U ID Report if Offline Uncorrectable count non-zero
# -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit
# -v N,ST Modifies labeling of Attribute N (see man page)
# -a Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198
# -F TYPE Use firmware bug workaround. Type is one of: none, samsung
# -P TYPE Drive-specific presets: use, ignore, show, showall
# # Comment: text after a hash sign is ignored
# \ Line continuation character
# Attribute ID is a decimal integer 1 <= ID <= 255
# except for -C and -U, where ID = 0 turns them off.
# All but -d, -m and -M Directives are only implemented for ATA devices
#
# If the test string DEVICESCAN is the first uncommented text
# then smartd will scan for devices /dev/hd[a-l] and /dev/sd[a-z]
DEVICESCAN -o on -H -l error -l selftest -t -m dustymabe@gmail.com -M test
It is pretty much all comments except for the last line. You can see
from the comments what each option on the last line means. To summarize
I am telling smartd
to:
“Monitor the health status as well as the error and selftest logs of
all /dev/hd[a-l] and /dev/sd[a-z] devices that are discovered to
have SMART capabilities. Report any errors/failures as well as startup
test messages to dustymabe@gmail.com.”
Now just make sure the smartd
service is configured to run by default
and your disks should be monitored! You can check this by looking to see
if you get an email when smartd
starts (make sure to check your spam
filter).
Dusty Mabe