Category:HDD Monitoring with rrdtool: Difference between revisions
Created page with "== Introduction == This HowTo explains how you can set up continuous monitoring of all your harddisks. It *uses smartmonctl *writes every 30 min the current status to a r..." |
m catchg |
||
(One intermediate revision by the same user not shown) | |||
Line 266: | Line 266: | ||
https://<QNAP>:8081/smartrrd</pre> | https://<QNAP>:8081/smartrrd</pre> | ||
Enjoy | Enjoy | ||
[[Category:Adding new services]] |
Latest revision as of 17:19, 26 October 2015
Introduction
This HowTo explains how you can set up continuous monitoring of all your harddisks. It
- uses smartmonctl
- writes every 30 min the current status to a round robin database using rrdtool
- generates for each S.M.A.R.T parameter 3 charts showing the status of the last week / last month / last year
Install Packages
- if not yet done, install Optware IPGK via the QNAP Web Administration site (under "App Center")
Alternative 1:
- launch Optware via the App Center (will open "The ipkg web frontend")
- to update the catalogue, select "Sync packages" -> yes, then press Submit
- filter to "smartmontools" and press Submit then click "install"
- filter to "rrdtool" and press Submit then click "install"
Alternative 2:
Log into your QNAP with SSH.
# ipkg install smartmontools # ipkg install rrdtool
Prepare Directories
# mkdir /mnt/HDA_ROOT/smartrrd # mkdir /share/Web/smartrrd
Install and Adopt the Script
Copy the following script to /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.sh
#!/bin/sh script_dir=$(dirname "${BASH_SOURCE[0]}") script_runtime=$(date '+%s') http_path="/share/Web/smartrrd" # 1 5 29 38 44 50 57 67 76 88 # +4 +24 +9 +6 +6 +7 +10 +9 +12 # ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE # 1 Raw_Read_Error_Rate 0x000f 114 099 006 Pre-fail Always - 72984072 smart_regex="^(.{4})(.{24})(.{9})(.{6})(.{6})(.{7})(.{10})(.{9})(.{12})(.+)$" . $script_dir/smartctl_all_drives.conf declare -a ATTRIBUTES IFS=' ' # Get data for all drives from smartmontools and store it in an array ATTRIBUTES # Later on this will allow to write values from all drives at once to the *.rrd file for disk in /dev/sd[a-d] do for oneline in $(smartctl -d ata -A $disk | grep 'Always\|Offline') do [[ $oneline =~ $smart_regex ]] smart_DISK=${disk:(-3)} smart_ID=${BASH_REMATCH[1]// /} smart_ID3=$(printf "%03d" $smart_ID) smart_ATTRIBUTE_NAME=${BASH_REMATCH[2]// /} smart_FLAG=${BASH_REMATCH[3]// /} smart_VALUE=${BASH_REMATCH[4]// /} smart_WORST=${BASH_REMATCH[5]// /} smart_THRESH=${BASH_REMATCH[6]// /} smart_TYPE=${BASH_REMATCH[7]// /} smart_UPDATED=${BASH_REMATCH[8]// /} smart_WHEN_FAILED=${BASH_REMATCH[9]// /} smart_RAW_VALUE=${BASH_REMATCH[10]%(*} # remove trailing "(..." string manipulation smart_RAW_VALUE=${smart_RAW_VALUE// /} # populate attributes array ATTRIBUTES[$smart_ID]+="$smart_DISK#$smart_RAW_VALUE " done done IFS=' ' # Scan array ATTRIBUTES for values and if existing, write all values to *.rrd # If necessary (e.g. when run for the first time), create the database for i in {1..256} do if [[ ${ATTRIBUTES[$i]} ]]; then smart_ID3=$(printf "%03d" $i) rrd_ds="" rrd_value="" for disk_rawvalue in ${ATTRIBUTES[$i]} do rrd_ds+=${disk_rawvalue%'#'*}: rrd_value+=${disk_rawvalue#*'#'}: done rrd_ds=${rrd_ds%:} rrd_value=${rrd_value%:} # create RRD if not yet exist if [[ ! -f $script_dir/rrd/$smart_ID3.rrd ]]; then rrdtool create "$script_dir/rrd/$smart_ID3.rrd" \ --step 1800 \ DS:sda:GAUGE:3600:0:U \ DS:sdb:GAUGE:3600:0:U \ DS:sdc:GAUGE:3600:0:U \ DS:sdd:GAUGE:3600:0:U \ RRA:MAX:0.5:1:336 \ RRA:MAX:0.5:2:744 \ RRA:MAX:0.5:48:365 # RRA:MAX:0.5:1:336 -> every 30min for 2x24x7 times (one week in 30min interval) # RRA:MAX:0.5:2:744 -> every second 30min for 24x31 times (one month in 1h interval) # RRA:MAX:0.5:48:365 -> every 48th 30min for 365 times (one year in 1day interval) fi rrdtool update "$script_dir/rrd/$smart_ID3.rrd" -t $rrd_ds $script_runtime:$rrd_value fi done # Create charts for all existing *.rrd file for filename in $script_dir/rrd/*.rrd do smart_ID3=${filename%'.'*} smart_ID3=${smart_ID3#*'/'rrd'/'} smart_ID=$(echo $smart_ID3 | sed 's/^0*//') rrdtool graph "$http_path/${smart_ID3}_week.png" -a PNG --title="${smart_attributes[$smart_ID]}" \ --vertical-label "RAW_VALUE" --start end-1w --end $script_runtime \ DEF:a=$filename:sda:MAX \ DEF:b=$filename:sdb:MAX \ DEF:c=$filename:sdc:MAX \ DEF:d=$filename:sdd:MAX \ LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \ LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \ LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \ LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s" rrdtool graph "$http_path/${smart_ID3}_month.png" -a PNG --title="${smart_attributes[$smart_ID]}" \ --vertical-label "RAW_VALUE" --start end-1m --end $script_runtime \ DEF:a=$filename:sda:MAX \ DEF:b=$filename:sdb:MAX \ DEF:c=$filename:sdc:MAX \ DEF:d=$filename:sdd:MAX \ LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \ LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \ LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \ LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s" rrdtool graph "$http_path/${smart_ID3}_year.png" -a PNG --title="${smart_attributes[$smart_ID]}" \ --vertical-label "RAW_VALUE" --start end-1y --end $script_runtime \ DEF:a=$filename:sda:MAX \ DEF:b=$filename:sdb:MAX \ DEF:c=$filename:sdc:MAX \ DEF:d=$filename:sdd:MAX \ LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \ LINE2:b#800000:"/dev/sdb" GPRINT:b:LAST:"%6.lf %s\n" \ LINE3:c#00FF00:"/dev/sdc" GPRINT:c:LAST:"%6.lf %s" \ LINE4:d#0000FF:"/dev/sdd" GPRINT:d:LAST:"%6.lf %s" done # Recreate index.html echo "" > $http_path/index.html for i in {1..256} do if [[ ${ATTRIBUTES[$i]} ]]; then smart_ID3=$(printf "%03d" $i) echo "<img src=\"${smart_ID3}_week.png\"><img src=\"${smart_ID3}_month.png\"><img src=\"${smart_ID3}_year.png\"><br>" \ >> $http_path/index.html fi done
The script is designed for the 4 drives sda, sdb, sdc, sdd.
There are several positions in the script that have to be addapted accordingly if you have more or less drives or different identifiers (e.g. sda).
I posted this script here with the hope that somebody would make it more flexible later .-)
- for disk in /dev/sd[a-d] -> change according to what "fdisk -l" says about installed drives
- DS:sda:GAUGE:3600:0:U -> add/remove additional drives
- DEF:a=$filename:sda:MAX \ -> add/remove additional drives
- LINE1:a#FF0000:"/dev/sda" GPRINT:a:LAST:"%6.lf %s" \ -> add/remove additional drives in all 3 charts (week/month/year), also change the color
Install Script Config File
Save the following file to /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.conf
The array is used to create meaningful chart titles.
smart_attributes[1]='001 Raw_Read_Error_Rate' smart_attributes[2]='002 Throughput_Performance' smart_attributes[3]='003 Spin_Up_Time' smart_attributes[4]='004 Start_Stop_Count' smart_attributes[5]='005 Reallocated_Sector_Ct' smart_attributes[7]='007 Seek_Error_Rate' smart_attributes[8]='008 Seek_Time_Performance' smart_attributes[9]='009 Power_On_Hours' smart_attributes[10]='010 Spin_Retry_Count' smart_attributes[11]='011 Calibration_Retry_Count' smart_attributes[12]='012 Power_Cycle_Count' smart_attributes[181]='181 Program_Fail_Cnt_Total' smart_attributes[183]='183 Runtime_Bad_Block' smart_attributes[184]='184 End-to-End_Error' smart_attributes[187]='187 Reported_Uncorrect' smart_attributes[188]='188 Command_Timeout' smart_attributes[189]='189 High_Fly_Writes' smart_attributes[190]='190 Airflow_Temperature_Cel' #smart_attributes[190]='190 ??' smart_attributes[191]='191 G-Sense_Error_Rate' smart_attributes[192]='192 Power-Off_Retract_Count' smart_attributes[193]='193 Load_Cycle_Count' smart_attributes[194]='194 Temperature_Celsius' smart_attributes[195]='195 Hardware_ECC_Recovered' smart_attributes[196]='196 Reallocated_Event_Count' smart_attributes[197]='197 Current_Pending_Sector' smart_attributes[198]='198 Offline_Uncorrectable' smart_attributes[199]='199 UDMA_CRC_Error_Count' smart_attributes[200]='200 Multi_Zone_Error_Rate' #smart_attributes[200]='200 ???' smart_attributes[223]='223 Load_Retry_Count' smart_attributes[225]='225 Load_Cycle_Count' smart_attributes[240]='240 Head_Flying_Hours' #smart_attributes[240]='240 ???' smart_attributes[241]='241 Total_LBAs_Written' smart_attributes[242]='242 Total_LBAs_Read'
In case you miss values here, please edit this wiki page and add them above. You should identify the attribute name using
smartctl -d ata -A /dev/hda
Unfortunately there are IDs that have multiple meanings like 190, 200, 230, 231, 232, 233, 240 (see: http://en.wikipedia.org/wiki/S.M.A.R.T.)
In case your drives use the strings that are commented out, adapt the .conf file accordingly.
Setup crontab
# vi /etc/config/crontab add the following line: */30 * * * * /mnt/HDA_ROOT/smartrrd/smartctl_all_drives.sh # crontab /etc/config/crontab # /etc/init.d/crond.sh restart
After 30 minutes there should be files in the directory /mnt/HDA_ROOT/smartrrd/rrd as well as in /share/Web/smartrrd
On my system, I tested the smartctl_all_drives.sh script at the command line and got an error apparently related to rrd directory creation. Also: chmod +x the smartctl_all_drives.sh and smartctl_all_drives.conf.
Manually creating the rrd directory seemed to make things work.
[/mnt/HDA_ROOT/smartrrd] # ./smartctl_all_drives.sh ERROR: creating './rrd/001.rrd': No such file or directory ERROR: opening './rrd/001.rrd': No such file or directory .. [/mnt/HDA_ROOT/smartrrd] # mkdir rrd [/mnt/HDA_ROOT/smartrrd] # ./smartctl_all_drives.sh 497x207 497x207 ...
Open Monitoring Website
Make sure Web Server service is enabled (Control Panel, Applications, Web Server) .
Now you can open the monitoring site which should be available somewhere under
http://<QNAP>/smartrrd https://<QNAP>/smartrrd https://<QNAP>:8081/smartrrd
Enjoy
This category currently contains no pages or media.