Archivi tag: hard disk

checksmart: script bash per effettuare le query S.M.A.R.T.

Gli hard disk di ultima generazione (ATA e SCSI) supportano una funzionalità molto interessante, denominata S.M.A.R.T. (Self-Monitoring, Analysis and Reporting Technology)

Il significato del suddetto acronimo è già di per sé chiaro. Infatti, tale tecnologia può effettuare alcuni test di auto diagnostica rivolti alle componenti elettromeccaniche del disco (o solo elettriche nel caso in cui si avesse a che fare con gli SSD). I risultati di tali test potranno quindi essere visualizzati dall’utente con un grado di dettaglio variabile a seconda delle sue esigenze, il quale potrà anche consultare lo storico degli ultimi test effettuati ed il log degli errori individuati durante le ore di normale esercizio del disco.

harddisk

Di tool in grado di interagire con la suddetta funzionalità ne esistono a bizzeffe e per tutti i sistemi operativi (vedi qui per ulteriori dettagli). Tra questi merita sicuramente una particolare menzione smartmontools, soprattutto se si vogliono effettuare le query S.M.A.R.T. in ambienti Unix like. Utilizzando proprio il suddetto tool ho deciso di creare uno scrip bash (checksmart) da richiamare mediante crontab in modo da schedulare dei check più o meno approfonditi ad intervalli di tempo regolari. I risultati ottenuti potranno quindi essere inviati alla casella email dell’utente, utilizzando la flag opzionale –email.

Ecco lo scrip in todo:

#!/bin/bash

bin=`/usr/bin/which smartctl`

logfile=/var/log/checksmart/checksmart.log

ROOT_UID=0

if [ "$UID" -ne "$ROOT_UID" ];then

        data=`date +'%F %H:%M:%S'`
        echo "$data - Error: you need to be root to run this scrip"
        exit 1

fi

if [[ ! -d /var/log/checksmart ]];then

        mkdir -p /var/log/checksmart
fi

if [[ ! -f /var/log/checksmart/checksmart.log ]];then

        touch /var/log/checksmart/checksmart.log
fi

diskno=`df -h | awk '{print $1}' | grep sd | wc -l`

type=$1

subtype=$2

address=$3

if [[ ! -z $address ]];then

        regex="^[a-z0-9!#\$%&'*+/=?^_\`{|}~-]+(\.[a-z0-9!#$%&'*+/=?^_\`{|}~-]+)*@([a-z0-9]([a-z0-9-]*[a-z0-9])?\.)+[a-z0-9]([a-z0-9-]*[a-z0-9])?\$"

        checkmail() {
                if [[ ! $address =~ $regex ]];then
                        data=`date +'%F %H:%M:%S'`
                        echo "$data - Error: invalid email address" | tee -a $logfile
                        exit 1;
                fi
        }

        checkmail;
fi

pre_running_controls() {

if [[ ! -z $bin ]];then
        if [[ ! -z $diskno ]];then
                data=`date +'%F %H:%M:%S'`
                echo "$data - Executing pre-running controls..." | tee -a $logfile
                sleep 2
                data=`date +'%F %H:%M:%S'`
                echo "$data - Checking for S.M.A.R.T. capabilities..." | tee -a $logfile
                for (( d=1; d<=$diskno; d++ ))
                do
                        disk=`df -h | awk '{print $1}' | grep sd | sed s'/.$//' | sed -n "$d p" | uniq`
                        smart_capable=`$bin -a $disk | grep "SMART support is: Available"`
                        data=`date +'%F %H:%M:%S'`
                        echo "$data - 1. Checking if S.M.A.R.T. support for disk $disk is available" | tee -a $logfile
                        if [[ ! -z $smart_capable ]];then
                                data=`date +'%F %H:%M:%S'`
                                echo "$data - S.M.A.R.T. support for $disk is available" | tee -a $logfile
                        else
                                data=`date +'%F %H:%M:%S'`
                                echo "$data - Error: S.M.A.R.T. support for $disk is NOT available, exiting..." | tee -a $logfile
                                exit 1;
                        fi

                        smart_enabled=`$bin -a $disk | grep "SMART support is: Enabled"`
                        data=`date +'%F %H:%M:%S'`
                        echo "$data - 2. Checking if S.M.A.R.T. support for disk $disk is enabled" | tee -a $logfile
                        if [[ ! -z $smart_enabled ]];then
                                data=`date +'%F %H:%M:%S'`
                                echo "$data - S.M.A.R.T. support for $disk is enabled" | tee -a $logfile
                        else
                                data=`date +'%F %H:%M:%S'`
                                echo "$data - Error: S.M.A.R.T. support for $disk disk is NOT enabled, trying to enable it..." >> $logfile
                                smart_enabled=`$bin -s on $disk |  grep "SMART support is: Enabled"`
                                if [[ ! -z $smart_enabled ]];then
                                        data=`date +'%F %H:%M:%S'`
                                        echo "$data - S.M.A.R.T. support for $disk disk is enabled NOW enabled" | tee -a $logfile
                                else
                                        data=`date +'%F %H:%M:%S'`
                                        echo "$data - Error: Unable to turn on S.M.A.R.T. support for disk $disk, exiting ..." | tee -a $logfile
                                        exit 1;
                                fi
                        fi
                        smart_self=`$bin -a $disk | grep "No self-tests have been logged"`
                        data=`date +'%F %H:%M:%S'`
                        echo "$data - 3. Checking if S.M.A.R.T. self-tests have been logged" | tee -a $logfile
                        if [[ ! -z $smart_self ]];then
                                data=`date +'%F %H:%M:%S'`
                                echo "$data - No S.M.A.R.T. self-tests have been logged yet, trying to run a short one..." | tee -a $logfile
                                smart_self=`$bin -t short $disk | grep "No self-tests have been logged"`
                                if [[ ! -z $smart_self ]];then
                                        data=`date +'%F %H:%M:%S'`
                                        echo "$data - Error: Unable to run a short self-test for disk $disk, exiting..." | tee -a $logfile
                                        exit 1;
                                else
                                        data=`date +'%F %H:%M:%S'`
                                        echo "$data - S.M.A.R.T. self-tests have been logged" | tee -a $logfile
                                fi
                        else
                                data=`date +'%F %H:%M:%S'`
                                echo "$data - S.M.A.R.T. support for $disk disk is enabled" | tee -a $logfile
                        fi
                done
        else
                data=`date +'%F %H:%M:%S'`
                echo "$data - Error: no ATA disks found" | tee -a $logfile
                exit 1;
        fi
else
        data=`date +'%F %H:%M:%S'`
        echo "$data - Error: no smartctl binary found - please install smarmontools" | tee -a $logfile
        exit 1;
fi
}

get_time_required() {

if [[ ! -z $bin ]];then
        if [[ ! -z $diskno ]];then
                for (( d=1; d<=$diskno; d++ ))
                do
                        data=`date +'%F %H:%M:%S'`
                        echo "$data - Getting time required by each test for each disk" | tee -a $logfile
                        disk=`df -h | awk '{print $1}' | grep sd | sed s'/.$//' | sed -n "$d p" | uniq`
                        time_short=`$bin -c $disk | grep -A 1 'Short' | grep minutes | awk '{print $5}' | sed -e 's/)//g'`
                        if [[ -z $time_short ]];then
                                echo "$data - Error: unable to get the time required by the short test for disk $disk" | tee -a $logfile
                        else
                                time_short_seconds=$((time_short*60 + 2))
                        fi
                        time_long=`$bin -c $disk | grep -A 1 'Extended' | grep minutes | awk '{print $5}' | sed -e 's/)//g'`
                        if [[ -z $time_long ]];then
                                echo "$data - Error: unable to get the time required by the long test for disk $disk" | tee -a $logfile
                        else
                                time_long_seconds=$((time_long * 60 + 2))
                        fi
                        time_conveyance=`$bin -c $disk | grep -A 1 'Conveyance' | grep minutes | awk '{print $5}' | sed -e 's/)//g'`
                        if [[ -z $time_conveyance ]];then
                                echo "$data - Error: unable to get the time required by the conveyance test for disk $disk" | tee -a $logfile
                        else
                                time_conveyance_seconds=$((time_conveyance * 60 + 2))
                        fi

                done
        else

                data=`date +'%F %H:%M:%S'`
                echo "$data - Error: no ATA disks found" | tee -a $logfile
                exit 1;

        fi
else

        data=`date +'%F %H:%M:%S'`
        echo "$data - Error: no smartctl binary found - please install smarmontools" | tee -a $logfile
        exit 1;

fi
}

get_time_required;

if [[ ! -z $bin ]];then
        if [[ ! -z $diskno ]];then
                pre_running_controls;
                if [[ ! -z $type ]];then
                        if [[ $type == "--brief" ]];then
                                for (( d=1; d<=$diskno; d++ ))
                                do
                                        disk=`df -h | awk '{print $1}' | grep sd | sed s'/.$//' | sed -n "$d p" | uniq`
                                        data=`date +'%F %H:%M:%S'`
                                        echo "$data - checking S.M.A.R.T entities for $disk disk" | tee -a $logfile
                                        if [[ $subtype == "--email" ]];then
                                                if [[ ! -z $address ]];then
                                                        data=`date +'%F %H:%M:%S'`
                                                        echo $data >> $logfile
                                                        $bin -a $disk | grep -E "SMART overall-health self-assessment test result:|Reallocated_Sector|Spin_Retry_Count|Runtime_Bad_Block|End-to-End_Error|Reported_Uncorrect|Command_Timeout|Current_Pending_Sector|Offline_Uncorrectable" >> result
                                                        cat result | mail -iv -s "brief S.M.A.R.T report for disk $disk" $address
                                                        cat result >> $logfile
                                                        rm -rf result
                                                else
                                                        echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary | --help>  [ --email <address>]" | tee -a $logfile
                                                        exit 1;
                                                fi
                                         elif [[ ! $subtype =~ "--email" ]] && [[ ! -z $subtype ]];then
                                                data=`date +'%F %H:%M:%S'`
                                                echo "$data - Error: unknown subtype $subtype" | tee -a $logfile
                                                echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary | --help> [ --email <address>]" | tee -a $logfile
                                                exit 1;
                                        else
                                                $bin -a $disk | grep -E "SMART overall-health self-assessment test result:|Reallocated_Sector|Spin_Retry_Count|Runtime_Bad_Block|End-to-End_Error|Reported_Uncorrect|Command_Timeout|Current_Pending_Sector|Offline_Uncorrectable" | tee -a $logfile
                                        fi
                                done
                                exit 0;
                        elif [[ $type == "--short" ]];then
                                for (( d=1; d<=$diskno; d++ ))
                                do
                                        disk=`df -h | awk '{print $1}' | grep sd | sed s'/.$//' | sed -n "$d p" | uniq`
                                        data=`date +'%F %H:%M:%S'`
                                        echo "$data - checking S.M.A.R.T entities for $disk disk" | tee -a $logfile
                                        if [[ $subtype == "--email" ]];then
                                                if [[ ! -z $address ]];then
                                                        data=`date +'%F %H:%M:%S'`
                                                        echo $data >> $logfile
                                                        $bin -t short $disk
                                                        if [[ ! -z $time_short_seconds ]];then
                                                                sleep $time_short_seconds
                                                        else
                                                                sleep 120
                                                        fi
                                                        $bin -a $disk >> result
                                                        cat result | mail -iv -s "short test S.M.A.R.T report for disk $disk" $address
                                                        cat result >> $logfile
                                                        rm -rf result
                                                else
                                                        echo "$data - Error: wrong email address format" | tee -a $logfile
                                                        echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary| --help> [ --email <address>]" | tee -a $logfile
                                                        exit 1;
                                                fi
                                        elif [[ ! $subtype =~ "--email" ]] && [[ ! -z $subtype ]];then
                                                data=`date +'%F %H:%M:%S'`
                                                echo "$data - Error: unknown subtype $subtype" | tee -a $logfile
                                                echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary| --help> [ --email <address>]" | tee -a $logfile
                                                exit 1;
                                        else
                                                $bin -t short $disk | tee -a $logfile
                                                if [[ ! -z $time_short_seconds ]];then
                                                        sleep $time_short_seconds
                                                else
                                                        sleep 120
                                                fi
                                                $bin -a $disk | tee -a $logfile
                                        fi
                                done
                                exit 0;
                         elif [[ $type == "--long" ]];then
                                for (( d=1; d<=$diskno; d++ ))
                                do
                                        disk=`df -h | awk '{print $1}' | grep sd | sed s'/.$//' | sed -n "$d p" | uniq`
                                        data=`date +'%F %H:%M:%S'`
                                        echo "$data - checking S.M.A.R.T entities for $disk disk" | tee -a $logfile
                                        if [[ $subtype == "--email" ]];then
                                                if [[ ! -z $address ]];then
                                                        data=`date +'%F %H:%M:%S'`
                                                        echo $data >> $logfile
                                                        $bin -t long $disk
                                                        if [[ ! -z $time_long_seconds ]];then
                                                                sleep $time_long_seconds
                                                        else
                                                                sleep 3600
                                                        fi
                                                        $bin -a $disk >> result
                                                        cat result | mail -iv -s "long test S.M.A.R.T report for disk $disk" $address
                                                        cat result >> $logfile
                                                        rm -rf result
                                                else
                                                        echo "$data - Error: wrong email address format" | tee -a $logfile
                                                        echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary| --help>  [ --email <address>]" | tee -a $logfile
                                                        exit 1;
                                                fi
                                        elif [[ ! $subtype =~ "--email" ]] && [[ ! -z $subtype ]];then
                                                data=`date +'%F %H:%M:%S'`
                                                echo "$data - Error: unknown subtype $subtype" | tee -a $logfile
                                                echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary| --help> [ --email <address>]" | tee -a $logfile
                                                exit 1;
                                        else
                                                $bin -t long $disk | tee -a $logfile
                                                if [[ ! -z $time_long_seconds ]];then
                                                        sleep $time_long_seconds
                                                else
                                                        sleep 3600
                                                fi
                                                $bin -a $disk | tee -a $logfile
                                        fi
                                done
                                exit 0;
                        elif [[ $type == "--conveyance" ]];then
                                for (( d=1; d<=$diskno; d++ ))
                                do
                                        disk=`df -h | awk '{print $1}' | grep sd | sed s'/.$//' | sed -n "$d p" | uniq`
                                        data=`date +'%F %H:%M:%S'`
                                        echo "$data: checking S.M.A.R.T entities for $disk disk" | tee -a $logfile
                                        if [[ $subtype == "--email" ]];then
                                                if [[ ! -z $address ]];then
                                                        data=`date +'%F %H:%M:%S'`
                                                        echo $data >> $logfile
                                                        $bin -t conveyance $disk
                                                        if [[ ! -z $time_conveyance_seconds ]];then
                                                                sleep $time_conveyance_seconds
                                                        else
                                                                sleep 240
                                                        fi
                                                        $bin -a $disk >> result
                                                        cat result | mail -iv -s "conveyance test S.M.A.R.T report for disk $disk" $address
                                                        cat result >> $logfile
                                                        rm -rf result
                                                else
                                                        echo "$data - Error: wrong email address format" | tee -a $logfile
                                                        echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary| --help> [ --email <address>]" | tee -a $logfile
                                                        exit 1;
                                                fi
                                        elif [[ ! $subtype =~ "--email" ]] && [[ ! -z $subtype ]];then
                                                data=`date +'%F %H:%M:%S'`
                                                echo "$data - Error: unknown subtype $subtype" | tee -a $logfile
                                                echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary| --help> [ --email <address>]" | tee -a $logfile
                                                exit 1;
                                        else
                                                $bin -t conveyance $disk | tee -a $logfile
                                                if [[ ! -z $time_conveyance_seconds ]];then
                                                        sleep $time_conveyance_seconds
                                                else
                                                        sleep 240
                                                fi
                                                $bin -a $disk | tee -a $logfile
                                        fi
                                done
                                exit 0;
                        elif [[ $type == "--summary" ]];then
                                for (( d=1; d<=$diskno; d++ ))
                                do
                                        disk=`df -h | awk '{print $1}' | grep sd | sed s'/.$//' | sed -n "$d p" | uniq`
                                        data=`date +'%F %H:%M:%S'`
                                        echo "$data: checking S.M.A.R.T entities for $disk disk" | tee -a $logfile
                                        if [[ $subtype == "--email" ]];then
                                                if [[ ! -z $address ]];then
                                                        data=`date +'%F %H:%M:%S'`
                                                        echo $data >> $logfile
                                                        $bin -l selftest $disk >> result
                                                        cat result | mail -iv -s "summary test S.M.A.R.T. report for disk $disk" $address
                                                        cat result >> $logfile
                                                        rm -rf result
                                                else
                                                        echo "$data - Error: wrong email address format" | tee -a $logfile
                                                        echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary | --help> [ --email <address>]"
                                                        exit 1;
                                                fi
                                        elif [[ ! $subtype =~ "--email" ]] && [[ ! -z $subtype ]];then
                                                data=`date +'%F %H:%M:%S'`
                                                echo "$data - Error: unknown subtype $subtype" | tee -a $logfile
                                                echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary | --help> [ --email <address>]"
                                                exit 1;
                                        else
                                                $bin -l selftest $disk | tee -a $logfile
                                        fi
                                done
                                exit 0;
                        elif [[ $type == "--help" ]];then
                                echo "Usage: checksmart < --brief| --short| --long| --conveyance| --summary| --help> [ --email <address>]"
                                echo "       --brief will display only critical pre-failure S.M.A.R.T. indicators"
                                echo "       --short will run a short S.M.A.R.T. test"
                                echo "       --long will run a long S.M.A.R.T. test"
                                echo "       --conveyance will run a conveyance S.M.A.R.T. test"
                                echo "       --summary will display only the self test history report"
                                echo "       --help will display this help"
                                echo "       --email <address> will send the S.M.A.R.T. report via email to the specified address"
                                echo "Example: checksmart --brief --email youraddress@yourdomain.com"
                                exit 0;
                        else
                                data=`date +'%F %H:%M:%S'`
                                echo "$data - Error: unknown type $type" | tee -a $logfile
                                echo "Usage: checksmart < --brief| --short| --long| --conveyance|  --summary| --help> [ --email <address>]"
                                exit 0;
                        fi
                else
                        data=`date +'%F %H:%M:%S'`
                        echo "$data - Error: type not defined" | tee -a $logfile
                        echo "Usage: checksmart < --brief| --short| --long| --conveyance|  --summary| --help | [ --email <address>]"
                        exit 0;
                fi
        else
                data=`date +'%F %H:%M:%S'`
                echo "$data - Error: no ATA disks found" | tee -a $logfile
                exit 1;
        fi
else
        data=`date +'%F %H:%M:%S'`
        echo "$data - Error: no smartctl binary found - please install smarmontools" | tee -a $logfile
        exit 1;
fi

Per prima cosa verifico che i dischi supportino la funzionalità S.M.A.R.T, che sia abilitata e che siano stati eseguiti in precedenza alcuni test di autodiagnostica. L’insieme delle 3 condizioni appena enunciate costituisce i pre-running controls (contenuti all’interno dell’omonima funzione).

Sucessivamente (mediante la funzione get_time_required()) ricavo i tempi richiesti per l’esecuzione di ciascun test (short, long e conveyance), da dare in pasto al comando sleep, così da consentire agli smartmontools di elaborarne i risultati ed al mio scrip di restituirli all’utente.

Completati i pre-running controls ed individuati i tempi di esecuzione, vengono lanciati i test veri e propri. Nello specifico, lo scrip supporta 5 modalità di funzionamento, ovvero:

1) –brief; viene restituito all’utente solo l’overall status ed i valori associati alle voci che potrebbero indicare un guasto imminente del disco (per ulteriori dettagli potete consultare questa pagina), quali Read Error Rate, Reallocated Sectors Count, Spin Retry Count, Runtime Bad Block, End to End Error, Reported Uncorrect, Command Timeout, Current Pending Sector, Offline Uncorrectable. A tal proposito occorre fare una precisazione: poichè la tecnologia S.M.A.R.T. non è uno standard, le suddette voci potrebbero non essere presenti nei risultati restituiti dallo scrip, ergo, in questo caso, potrete modificarlo in funzione delle voci supportate.

2) –short; esegue uno short test.

3) –long; esegue un long test (molto più approfondito).

4) –conveyance; interroga il disco sugli eventuali danni che si sono verificati durante le operazioni di trasporto dello stesso. Non tutti i dischi supportano tale opzione.

5) –summary; equivale ad uno smartctl -l e consente di visualizzare solo lo storico dei test pregressi.

Come già accennato in precedenza, il parametro –email è opzionale ma, nel caso in cui l’utente decidesse di utilizzarlo, lo scrip verifica che il formato dell’indirizzo di posta a cui il report dovrà essere indirizzato sia corretto.

Infine, per consentire un maggiore controllo sui passaggi cruciali eseguiti dallo scrip, ho fatto in modo che essi vengano “registrati” all’interno di un apposito file di log.

E’ tutto. Alla prossima.