Category:HDD Monitoring with rrdtool

From QNAPedia
Revision as of 17:19, 26 October 2015 by Glenn (talk | contribs) (catchg)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Introduction

This HowTo explains how you can set up continuous monitoring of all your harddisks. It

  • uses smartmonctl
  • writes every 30 min the current status to a round robin database using rrdtool
  • generates for each S.M.A.R.T parameter 3 charts showing the status of the last week / last month / last year

Smartrrd.jpg

Install Packages

  • if not yet done, install Optware IPGK via the QNAP Web Administration site (under "App Center")

Alternative 1:

  • launch Optware via the App Center (will open "The ipkg web frontend")
  • to update the catalogue, select "Sync packages" -> yes, then press Submit
  • filter to "smartmontools" and press Submit then click "install"
  • filter to "rrdtool" and press Submit then click "install"

Alternative 2:

Log into your QNAP with SSH.

# ipkg install smartmontools
# ipkg install rrdtool

Prepare Directories

# mkdir /mnt/HDA_ROOT/smartrrd
# mkdir /share/Web/smartrrd

Install and Adopt the Script

Copy the following script to /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.sh

#!/bin/sh

script_dir=$(dirname "${BASH_SOURCE[0]}")
script_runtime=$(date '+%s')

http_path="/share/Web/smartrrd"

# 1   5                       29       38    44    50     57        67       76          88
#  +4  +24                      +9       +6    +6    +7     +10       +9       +12
# ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
#   1 Raw_Read_Error_Rate     0x000f   114   099   006    Pre-fail  Always       -       72984072
smart_regex="^(.{4})(.{24})(.{9})(.{6})(.{6})(.{7})(.{10})(.{9})(.{12})(.+)$"

. $script_dir/smartctl_all_drives.conf

declare -a ATTRIBUTES

IFS='
'

# Get data for all drives from smartmontools and store it in an array ATTRIBUTES
# Later on this will allow to write values from all drives at once to the *.rrd file

for disk in /dev/sd[a-d]
do
  for oneline in $(smartctl -d ata -A $disk | grep 'Always\|Offline')
  do
    [[ $oneline =~ $smart_regex ]]

    smart_DISK=${disk:(-3)}
    smart_ID=${BASH_REMATCH[1]// /}
    smart_ID3=$(printf "%03d" $smart_ID)

    smart_ATTRIBUTE_NAME=${BASH_REMATCH[2]// /}
    smart_FLAG=${BASH_REMATCH[3]// /}
    smart_VALUE=${BASH_REMATCH[4]// /}
    smart_WORST=${BASH_REMATCH[5]// /}
    smart_THRESH=${BASH_REMATCH[6]// /}
    smart_TYPE=${BASH_REMATCH[7]// /}
    smart_UPDATED=${BASH_REMATCH[8]// /}
    smart_WHEN_FAILED=${BASH_REMATCH[9]// /}
    smart_RAW_VALUE=${BASH_REMATCH[10]%(*} # remove trailing "(..." string manipulation
    smart_RAW_VALUE=${smart_RAW_VALUE// /}

    # populate attributes array
    ATTRIBUTES[$smart_ID]+="$smart_DISK#$smart_RAW_VALUE "

  done
done

IFS=' '

# Scan array ATTRIBUTES for values and if existing, write all values to *.rrd
# If necessary (e.g. when run for the first time), create the database

for i in {1..256}
do
  if [[ ${ATTRIBUTES[$i]} ]]; then
    smart_ID3=$(printf "%03d" $i)

    rrd_ds=""
    rrd_value=""

    for disk_rawvalue in ${ATTRIBUTES[$i]}
    do
      rrd_ds+=${disk_rawvalue%'#'*}:
      rrd_value+=${disk_rawvalue#*'#'}:
    done

    rrd_ds=${rrd_ds%:}
    rrd_value=${rrd_value%:}

    # create RRD if not yet exist
    if [[ ! -f $script_dir/rrd/$smart_ID3.rrd ]]; then
      rrdtool create "$script_dir/rrd/$smart_ID3.rrd" \
        --step 1800 \
        DS:sda:GAUGE:3600:0:U \
        DS:sdb:GAUGE:3600:0:U \
        DS:sdc:GAUGE:3600:0:U \
        DS:sdd:GAUGE:3600:0:U \
        RRA:MAX:0.5:1:336 \
        RRA:MAX:0.5:2:744 \
        RRA:MAX:0.5:48:365

        # RRA:MAX:0.5:1:336  -> every 30min for 2x24x7 times (one week in 30min interval)
        # RRA:MAX:0.5:2:744  -> every second 30min for 24x31 times (one month in 1h interval)
        # RRA:MAX:0.5:48:365 -> every 48th 30min for 365 times (one year in 1day interval)
    fi

    rrdtool update "$script_dir/rrd/$smart_ID3.rrd" -t $rrd_ds $script_runtime:$rrd_value

  fi
done

# Create charts for all existing *.rrd file

for filename in $script_dir/rrd/*.rrd
do
  smart_ID3=${filename%'.'*}
  smart_ID3=${smart_ID3#*'/'rrd'/'}
  smart_ID=$(echo $smart_ID3 | sed 's/^0*//')

  rrdtool graph "$http_path/${smart_ID3}_week.png" -a PNG --title="${smart_attributes[$smart_ID]}" \
    --vertical-label "RAW_VALUE" --start end-1w --end $script_runtime \
    DEF:a=$filename:sda:MAX \
    DEF:b=$filename:sdb:MAX \
    DEF:c=$filename:sdc:MAX \
    DEF:d=$filename:sdd:MAX \
    LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \
    LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \
    LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \
    LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s"

  rrdtool graph "$http_path/${smart_ID3}_month.png" -a PNG --title="${smart_attributes[$smart_ID]}" \
    --vertical-label "RAW_VALUE" --start end-1m --end $script_runtime \
    DEF:a=$filename:sda:MAX \
    DEF:b=$filename:sdb:MAX \
    DEF:c=$filename:sdc:MAX \
    DEF:d=$filename:sdd:MAX \
    LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \
    LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \
    LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \
    LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s"

  rrdtool graph "$http_path/${smart_ID3}_year.png" -a PNG --title="${smart_attributes[$smart_ID]}" \
    --vertical-label "RAW_VALUE" --start end-1y --end $script_runtime \
    DEF:a=$filename:sda:MAX \
    DEF:b=$filename:sdb:MAX \
    DEF:c=$filename:sdc:MAX \
    DEF:d=$filename:sdd:MAX \
    LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \
    LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \
    LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \
    LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s"

done

# Recreate index.html

echo "" > $http_path/index.html

for i in {1..256}
do
  if [[ ${ATTRIBUTES[$i]} ]]; then

    smart_ID3=$(printf "%03d" $i)
    echo "<img src=\"${smart_ID3}_week.png\"><img src=\"${smart_ID3}_month.png\"><img src=\"${smart_ID3}_year.png\"><br>" \
      >> $http_path/index.html
  fi
done

The script is designed for the 4 drives sda, sdb, sdc, sdd

There are several positions in the script that have to be addapted accordingly if you have more or less drives or different identifiers (e.g. sda).
I posted this script here with the hope that somebody would make it more flexible later .-)

  • for disk in /dev/sd[a-d]  -> change according to what "fdisk -l" says about installed drives
  • DS:sda:GAUGE:3600:0:U -> add/remove additional drives
  • DEF:a=$filename:sda:MAX \ -> add/remove additional drives
  • LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \ -> add/remove additional drives in all 3 charts (week/month/year), also change the color

Install Script Config File

Save the following file to /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.conf
The array is used to create meaningful chart titles.

smart_attributes[1]='001 Raw_Read_Error_Rate'
smart_attributes[2]='002 Throughput_Performance'
smart_attributes[3]='003 Spin_Up_Time'
smart_attributes[4]='004 Start_Stop_Count'
smart_attributes[5]='005 Reallocated_Sector_Ct'
smart_attributes[7]='007 Seek_Error_Rate'
smart_attributes[8]='008 Seek_Time_Performance'
smart_attributes[9]='009 Power_On_Hours'
smart_attributes[10]='010 Spin_Retry_Count'
smart_attributes[11]='011 Calibration_Retry_Count'
smart_attributes[12]='012 Power_Cycle_Count'
smart_attributes[181]='181 Program_Fail_Cnt_Total'
smart_attributes[183]='183 Runtime_Bad_Block'
smart_attributes[184]='184 End-to-End_Error'
smart_attributes[187]='187 Reported_Uncorrect'
smart_attributes[188]='188 Command_Timeout'
smart_attributes[189]='189 High_Fly_Writes'
smart_attributes[190]='190 Airflow_Temperature_Cel'
#smart_attributes[190]='190 ??'
smart_attributes[191]='191 G-Sense_Error_Rate'
smart_attributes[192]='192 Power-Off_Retract_Count'
smart_attributes[193]='193 Load_Cycle_Count'
smart_attributes[194]='194 Temperature_Celsius'
smart_attributes[195]='195 Hardware_ECC_Recovered'
smart_attributes[196]='196 Reallocated_Event_Count'
smart_attributes[197]='197 Current_Pending_Sector'
smart_attributes[198]='198 Offline_Uncorrectable'
smart_attributes[199]='199 UDMA_CRC_Error_Count'
smart_attributes[200]='200 Multi_Zone_Error_Rate'
#smart_attributes[200]='200 ???'
smart_attributes[223]='223 Load_Retry_Count'
smart_attributes[225]='225 Load_Cycle_Count'
smart_attributes[240]='240 Head_Flying_Hours'
#smart_attributes[240]='240 ???'
smart_attributes[241]='241 Total_LBAs_Written'
smart_attributes[242]='242 Total_LBAs_Read'

In case you miss values here, please edit this wiki page and add them above. You should identify the attribute name using

smartctl -d ata -A /dev/hda

Unfortunately there are IDs that have multiple meanings like 190, 200, 230, 231, 232, 233, 240 (see: http://en.wikipedia.org/wiki/S.M.A.R.T.)
In case your drives use the strings that are commented out, adapt the .conf file accordingly.

Setup crontab

# vi /etc/config/crontab

add the following line:
*/30 * * * * /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.sh

# crontab /etc/config/crontab
# /etc/init.d/crond.sh restart

After 30 minutes there should be files in the directory /mnt/HDA_ROOT/smartrrd/rrd as well as in /share/Web/smartrrd

On my system, I tested the smartctl_all_drives.sh script at the command line and got an error apparently related to rrd directory creation. Also: chmod +x the smartctl_all_drives.sh and smartctl_all_drives.conf.

Manually creating the rrd directory seemed to make things work.

[/mnt/HDA_ROOT/smartrrd] # ./smartctl_all_drives.sh
ERROR: creating './rrd/001.rrd': No such file or directory
ERROR: opening './rrd/001.rrd': No such file or directory
..
[/mnt/HDA_ROOT/smartrrd] # mkdir rrd
[/mnt/HDA_ROOT/smartrrd] # ./smartctl_all_drives.sh
497x207
497x207
...

Open Monitoring Website

Make sure Web Server service is enabled (Control Panel, Applications, Web Server) .

Now you can open the monitoring site which should be available somewhere under

http://<QNAP>/smartrrd 
https://<QNAP>/smartrrd
https://<QNAP>:8081/smartrrd

Enjoy

This category currently contains no pages or media.