Category:HDD Monitoring with rrdtool

From QNAPedia
Jump to navigation Jump to search
The printable version is no longer supported and may have rendering errors. Please update your browser bookmarks and please use the default browser print function instead.

Introduction

This HowTo explains how you can set up continuous monitoring of all your harddisks. It

  • uses smartmonctl
  • writes every 30 min the current status to a round robin database using rrdtool
  • generates for each S.M.A.R.T parameter 3 charts showing the status of the last week / last month / last year

Smartrrd.jpg

Install Packages

  • if not yet done, install Optware IPGK via the QNAP Web Administration site (under "App Center")

Alternative 1:

  • launch Optware via the App Center (will open "The ipkg web frontend")
  • to update the catalogue, select "Sync packages" -> yes, then press Submit
  • filter to "smartmontools" and press Submit then click "install"
  • filter to "rrdtool" and press Submit then click "install"

Alternative 2:

Log into your QNAP with SSH.

# ipkg install smartmontools
# ipkg install rrdtool

Prepare Directories

# mkdir /mnt/HDA_ROOT/smartrrd
# mkdir /share/Web/smartrrd

Install and Adopt the Script

Copy the following script to /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.sh

#!/bin/sh

script_dir=$(dirname "${BASH_SOURCE[0]}")
script_runtime=$(date '+%s')

http_path="/share/Web/smartrrd"

# 1   5                       29       38    44    50     57        67       76          88
#  +4  +24                      +9       +6    +6    +7     +10       +9       +12
# ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
#   1 Raw_Read_Error_Rate     0x000f   114   099   006    Pre-fail  Always       -       72984072
smart_regex="^(.{4})(.{24})(.{9})(.{6})(.{6})(.{7})(.{10})(.{9})(.{12})(.+)$"

. $script_dir/smartctl_all_drives.conf

declare -a ATTRIBUTES

IFS='
'

# Get data for all drives from smartmontools and store it in an array ATTRIBUTES
# Later on this will allow to write values from all drives at once to the *.rrd file

for disk in /dev/sd[a-d]
do
  for oneline in $(smartctl -d ata -A $disk | grep 'Always\|Offline')
  do
    [[ $oneline =~ $smart_regex ]]

    smart_DISK=${disk:(-3)}
    smart_ID=${BASH_REMATCH[1]// /}
    smart_ID3=$(printf "%03d" $smart_ID)

    smart_ATTRIBUTE_NAME=${BASH_REMATCH[2]// /}
    smart_FLAG=${BASH_REMATCH[3]// /}
    smart_VALUE=${BASH_REMATCH[4]// /}
    smart_WORST=${BASH_REMATCH[5]// /}
    smart_THRESH=${BASH_REMATCH[6]// /}
    smart_TYPE=${BASH_REMATCH[7]// /}
    smart_UPDATED=${BASH_REMATCH[8]// /}
    smart_WHEN_FAILED=${BASH_REMATCH[9]// /}
    smart_RAW_VALUE=${BASH_REMATCH[10]%(*} # remove trailing "(..." string manipulation
    smart_RAW_VALUE=${smart_RAW_VALUE// /}

    # populate attributes array
    ATTRIBUTES[$smart_ID]+="$smart_DISK#$smart_RAW_VALUE "

  done
done

IFS=' '

# Scan array ATTRIBUTES for values and if existing, write all values to *.rrd
# If necessary (e.g. when run for the first time), create the database

for i in {1..256}
do
  if [[ ${ATTRIBUTES[$i]} ]]; then
    smart_ID3=$(printf "%03d" $i)

    rrd_ds=""
    rrd_value=""

    for disk_rawvalue in ${ATTRIBUTES[$i]}
    do
      rrd_ds+=${disk_rawvalue%'#'*}:
      rrd_value+=${disk_rawvalue#*'#'}:
    done

    rrd_ds=${rrd_ds%:}
    rrd_value=${rrd_value%:}

    # create RRD if not yet exist
    if [[ ! -f $script_dir/rrd/$smart_ID3.rrd ]]; then
      rrdtool create "$script_dir/rrd/$smart_ID3.rrd" \
        --step 1800 \
        DS:sda:GAUGE:3600:0:U \
        DS:sdb:GAUGE:3600:0:U \
        DS:sdc:GAUGE:3600:0:U \
        DS:sdd:GAUGE:3600:0:U \
        RRA:MAX:0.5:1:336 \
        RRA:MAX:0.5:2:744 \
        RRA:MAX:0.5:48:365

        # RRA:MAX:0.5:1:336  -> every 30min for 2x24x7 times (one week in 30min interval)
        # RRA:MAX:0.5:2:744  -> every second 30min for 24x31 times (one month in 1h interval)
        # RRA:MAX:0.5:48:365 -> every 48th 30min for 365 times (one year in 1day interval)
    fi

    rrdtool update "$script_dir/rrd/$smart_ID3.rrd" -t $rrd_ds $script_runtime:$rrd_value

  fi
done

# Create charts for all existing *.rrd file

for filename in $script_dir/rrd/*.rrd
do
  smart_ID3=${filename%'.'*}
  smart_ID3=${smart_ID3#*'/'rrd'/'}
  smart_ID=$(echo $smart_ID3 | sed 's/^0*//')

  rrdtool graph "$http_path/${smart_ID3}_week.png" -a PNG --title="${smart_attributes[$smart_ID]}" \
    --vertical-label "RAW_VALUE" --start end-1w --end $script_runtime \
    DEF:a=$filename:sda:MAX \
    DEF:b=$filename:sdb:MAX \
    DEF:c=$filename:sdc:MAX \
    DEF:d=$filename:sdd:MAX \
    LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \
    LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \
    LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \
    LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s"

  rrdtool graph "$http_path/${smart_ID3}_month.png" -a PNG --title="${smart_attributes[$smart_ID]}" \
    --vertical-label "RAW_VALUE" --start end-1m --end $script_runtime \
    DEF:a=$filename:sda:MAX \
    DEF:b=$filename:sdb:MAX \
    DEF:c=$filename:sdc:MAX \
    DEF:d=$filename:sdd:MAX \
    LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \
    LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \
    LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \
    LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s"

  rrdtool graph "$http_path/${smart_ID3}_year.png" -a PNG --title="${smart_attributes[$smart_ID]}" \
    --vertical-label "RAW_VALUE" --start end-1y --end $script_runtime \
    DEF:a=$filename:sda:MAX \
    DEF:b=$filename:sdb:MAX \
    DEF:c=$filename:sdc:MAX \
    DEF:d=$filename:sdd:MAX \
    LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \
    LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \
    LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \
    LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s"

done

# Recreate index.html

echo "" > $http_path/index.html

for i in {1..256}
do
  if [[ ${ATTRIBUTES[$i]} ]]; then

    smart_ID3=$(printf "%03d" $i)
    echo "<img src=\"${smart_ID3}_week.png\"><img src=\"${smart_ID3}_month.png\"><img src=\"${smart_ID3}_year.png\"><br>" \
      >> $http_path/index.html
  fi
done

The script is designed for the 4 drives sda, sdb, sdc, sdd

There are several positions in the script that have to be addapted accordingly if you have more or less drives or different identifiers (e.g. sda).
I posted this script here with the hope that somebody would make it more flexible later .-)

  • for disk in /dev/sd[a-d]  -> change according to what "fdisk -l" says about installed drives
  • DS:sda:GAUGE:3600:0:U -> add/remove additional drives
  • DEF:a=$filename:sda:MAX \ -> add/remove additional drives
  • LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \ -> add/remove additional drives in all 3 charts (week/month/year), also change the color

Install Script Config File

Save the following file to /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.conf
The array is used to create meaningful chart titles.

smart_attributes[1]='001 Raw_Read_Error_Rate'
smart_attributes[2]='002 Throughput_Performance'
smart_attributes[3]='003 Spin_Up_Time'
smart_attributes[4]='004 Start_Stop_Count'
smart_attributes[5]='005 Reallocated_Sector_Ct'
smart_attributes[7]='007 Seek_Error_Rate'
smart_attributes[8]='008 Seek_Time_Performance'
smart_attributes[9]='009 Power_On_Hours'
smart_attributes[10]='010 Spin_Retry_Count'
smart_attributes[11]='011 Calibration_Retry_Count'
smart_attributes[12]='012 Power_Cycle_Count'
smart_attributes[181]='181 Program_Fail_Cnt_Total'
smart_attributes[183]='183 Runtime_Bad_Block'
smart_attributes[184]='184 End-to-End_Error'
smart_attributes[187]='187 Reported_Uncorrect'
smart_attributes[188]='188 Command_Timeout'
smart_attributes[189]='189 High_Fly_Writes'
smart_attributes[190]='190 Airflow_Temperature_Cel'
#smart_attributes[190]='190 ??'
smart_attributes[191]='191 G-Sense_Error_Rate'
smart_attributes[192]='192 Power-Off_Retract_Count'
smart_attributes[193]='193 Load_Cycle_Count'
smart_attributes[194]='194 Temperature_Celsius'
smart_attributes[195]='195 Hardware_ECC_Recovered'
smart_attributes[196]='196 Reallocated_Event_Count'
smart_attributes[197]='197 Current_Pending_Sector'
smart_attributes[198]='198 Offline_Uncorrectable'
smart_attributes[199]='199 UDMA_CRC_Error_Count'
smart_attributes[200]='200 Multi_Zone_Error_Rate'
#smart_attributes[200]='200 ???'
smart_attributes[223]='223 Load_Retry_Count'
smart_attributes[225]='225 Load_Cycle_Count'
smart_attributes[240]='240 Head_Flying_Hours'
#smart_attributes[240]='240 ???'
smart_attributes[241]='241 Total_LBAs_Written'
smart_attributes[242]='242 Total_LBAs_Read'

In case you miss values here, please edit this wiki page and add them above. You should identify the attribute name using

smartctl -d ata -A /dev/hda

Unfortunately there are IDs that have multiple meanings like 190, 200, 230, 231, 232, 233, 240 (see: http://en.wikipedia.org/wiki/S.M.A.R.T.)
In case your drives use the strings that are commented out, adapt the .conf file accordingly.

Setup crontab

# vi /etc/config/crontab

add the following line:
*/30 * * * * /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.sh

# crontab /etc/config/crontab
# /etc/init.d/crond.sh restart

After 30 minutes there should be files in the directory /mnt/HDA_ROOT/smartrrd/rrd as well as in /share/Web/smartrrd

On my system, I tested the smartctl_all_drives.sh script at the command line and got an error apparently related to rrd directory creation. Also: chmod +x the smartctl_all_drives.sh and smartctl_all_drives.conf.

Manually creating the rrd directory seemed to make things work.

[/mnt/HDA_ROOT/smartrrd] # ./smartctl_all_drives.sh
ERROR: creating './rrd/001.rrd': No such file or directory
ERROR: opening './rrd/001.rrd': No such file or directory
..
[/mnt/HDA_ROOT/smartrrd] # mkdir rrd
[/mnt/HDA_ROOT/smartrrd] # ./smartctl_all_drives.sh
497x207
497x207
...

Open Monitoring Website

Make sure Web Server service is enabled (Control Panel, Applications, Web Server) .

Now you can open the monitoring site which should be available somewhere under

http://<QNAP>/smartrrd 
https://<QNAP>/smartrrd
https://<QNAP>:8081/smartrrd

Enjoy

This category currently contains no pages or media.