Jan
06
2008

SMART smartd Configuration, openSUSE 10.2

Stumbled across smartd in YaST/System/System Services (runlevel) and turned it on. In addition to monitoring the long-term health trend of the disk, SMART also provides interesting real-time information about the hard drive: total hours, number of power cycles, current temperature, etc. I liked smartmontools so much that I also installed it on my WindowsXP machines, along with HDD Health.

I configured /etc/smartd.conf after reading the smartd.conf and smartctl man pages, the smartd log entries in the messages file, and comments within the file itself. Current setup:

  /dev/sda -d sat \
  -a -o off -S on \
  -s (O/../.././07|S/../.[27]/./08|L/.[02468]/15/./08) \
  -m root@localhost -M test

Notes on the configuration parameters:

  1. -d sat – lots of testing to determine this should be sat (SCSI to ATA Translation) and not ata. The various man pages are vague, with one saying “follow the hints appearing in the log file.” Well, the hint in my log file said “use -d ata or -d sat” (sigh). Both seemed to work, so I began with sat since the libata library is present. Subsequent tests of ata produced command failure messages in the log, thus sat is correct.
  2. -a – turn on the default recommended set of monitoring functions.
  3. -S on “S” is variously described as “turning on SMART” or “turning on Attribute Autosave”. I believe this is a modal parameter within the disk drive itself, and starting smartd turns this on by default. The manual recommends adding to the configuration to ensure it doesn’t get shut off. It echoes “enabled SMART Attribute Autosave” to messages, reassuring me that SMART is running.
  4. -o off – “o” controls “SMART Automatic Offline Testing” (data collection really, not testing). The man page says this command is obsolete and that the collection intervals are chosen by the disk manufacturer. The “O” offline test (discussed next) causes the same data to be collected.
  5. -s REGEXP – schedule various off-line tests My schedule: O (offline immediate), every day, 7am; S (Short), every 5 days, 8am; L (Long) – 15th of alternate months. I haven’t discovered what the differences are between the short and long tests (for my disk the short test takes 2 minutes vs. 98 minutes for the long test).
  6. -m, -M – email address for important messages. “-M test” sends an email on startup to confirm everything is working.
  7. -i N – (command line only, not conf file) set the interval for polling the disk. Seems like this shoud be controllable from the conf file, but since it isn’t I’m staying with the default of 30 minutes.

Notes:

On Linux, “smartctl -c” always showed a failure in the offline data collection status:

(0x05) Offline data collection activity was aborted by an interrupting command from host.
Auto Offline Data Collection: Disabled.

Examining the messages file shows auto-collection being enabled and the scheduled offline collection being started. Everything seems normal, but auto-collection becomes disabled at some point. On two WindowsXP PCs, data collection activity completes normally and Offline Collection is enabled. I don’t know how auto-collection was enabled on these two machines; perhaps HDD Health did it.

Is the Linux collection failure connected with auto-collection becoming disabled? First attempt at solution: switched from “-o on” to off after realizing that my scheduled testing ran the same test. Changed short test schedule to one per week in case it was causing the interruption. The problem has gone away and smartctl now reports “data collection activity was completed without error”.

References:

  • Wikipedia – best description of the SMART attributes that I found (other than the spec itself which is exhausting). The external link “Out SMART your hard drive” helped explain why the raw data was different between my Maxtor and Seagate hard drives.
  • smartmontools – home page for the SMART tools. They are installed by default in openSUSE Linux, and available for download for Windows XP.
  • HDD Health – a graphical user interface that presents current SMART information and failure alerts for Windows PCs. Still need smartmontools to trigger the offline tests.

posted in opensuse, SUSE, SysAdmin by Bozzie

2 Comments to "SMART smartd Configuration, openSUSE 10.2"

  1. Ed Shea wrote:

    Nice find Tom!

    Are you recommending SMART tools for win desktop applications? The data back-up discipline among desktop users is poor, will this utility give reliable enough warnings of a failure to allow for a back-up of data before an imminent failure occurs?

  2. Konstantin Khomoutov wrote:

    Thank you for your post.
    Recently I faced a soft RAID crash on one of my systems and looked for some more proactive setup. Your article is exactly what I needed.

 
Powered by Wordpress and MySQL. Theme by openark.org