A while back I set up
mdadmto monitor my RAID array and send email alerts to notify me of failures. At the same time I also set up
smartd(see S.M.A.R.T. ) to monitor the hard drives themselves and to send me email alerts.
To do this you edit the
/etc/smartd.conffile. After I was done my
/etc/smartd.conffile looked like the following:
# # HERE IS A LIST OF DIRECTIVES FOR THIS CONFIGURATION FILE. # PLEASE SEE THE smartd.conf MAN PAGE FOR DETAILS # # -d TYPE Set the device type: ata, scsi, marvell, removable, 3ware,N, hpt,L/M/N # -T TYPE set the tolerance to one of: normal, permissive # -o VAL Enable/disable automatic offline tests (on/off) # -S VAL Enable/disable attribute autosave (on/off) # -n MODE No check. MODE is one of: never, sleep, standby, idle # -H Monitor SMART Health Status, report if failed # -l TYPE Monitor SMART log. Type is one of: error, selftest # -f Monitor for failure of any 'Usage' Attributes # -m ADD Send warning email to ADD for -H, -l error, -l selftest, and -f # -M TYPE Modify email warning behavior (see man page) # -s REGE Start self-test when type/date matches regular expression (see man page) # -p Report changes in 'Prefailure' Normalized Attributes # -u Report changes in 'Usage' Normalized Attributes # -t Equivalent to -p and -u Directives # -r ID Also report Raw values of Attribute ID with -p, -u or -t # -R ID Track changes in Attribute ID Raw value with -p, -u or -t # -i ID Ignore Attribute ID for -f Directive # -I ID Ignore Attribute ID for -p, -u or -t Directive # -C ID Report if Current Pending Sector count non-zero # -U ID Report if Offline Uncorrectable count non-zero # -W D,I,C Monitor Temperature D)ifference, I)nformal limit, C)ritical limit # -v N,ST Modifies labeling of Attribute N (see man page) # -a Default: equivalent to -H -f -t -l error -l selftest -C 197 -U 198 # -F TYPE Use firmware bug workaround. Type is one of: none, samsung # -P TYPE Drive-specific presets: use, ignore, show, showall # # Comment: text after a hash sign is ignored # \ Line continuation character # Attribute ID is a decimal integer 1 <= ID <= 255 # except for -C and -U, where ID = 0 turns them off. # All but -d, -m and -M Directives are only implemented for ATA devices # # If the test string DEVICESCAN is the first uncommented text # then smartd will scan for devices /dev/hd[a-l] and /dev/sd[a-z] DEVICESCAN -o on -H -l error -l selftest -t -m email@example.com -M test
It is pretty much all comments except for the last line. You can see from the comments what each option on the last line means. To summarize I am telling
"Monitor the health status as well as the error and selftest logs of all /dev/hd[a-l] and /dev/sd[a-z] devices that are discovered to have SMART capabilities. Report any errors/failures as well as startup test messages to firstname.lastname@example.org."
Now just make sure the
smartdservice is configured to run by default and your disks should be monitored! You can check this by looking to see if you get an email when
smartdstarts (make sure to check your spam filter).