Nagios: script bash per monitorare lo stato dei volumi RAID

Partendo dalle considerazioni fatte in questo post, ho deciso di mettere a punto uno script bash da integrare a Nagios, in modo da monitorare lo status dei volumi RAID (e dei dischi fisici annessi) a prescindere dal metodo utilizzato per l’implementazione di tale tecnologia (hardware, fake oppure software).

nagiosDi seguito riporto il suddetto script nella sua interezza:

#!/bin/bash

type=$1

subtype=$2

element=$3

usage="check_raid <--software|--fake|--hardware> [--megaraid|--mpt] [--volume|--physical|--battery]"

if [[ ! -z "$type" && "$type" =~ "software" ]];then
        okswraid=0;
        koswraid=0;
        volumes=`cat /proc/mdstat | grep md | grep active | grep -v inactive | awk '{print $1}' | wc -l`
        if [[ ! -z $volumes ]];then
                for (( v=1; v<=$volumes; v++ ))
                do
                        volume=`cat /proc/mdstat | grep md | grep active | grep -v inactive | awk '{print $1}' | sed -n "$v p"`
                        raidtype=`cat /proc/mdstat | grep md | grep active | grep -v inactive | awk '{print $4}' | sed -n "$v p"`
                        diskno=`cat /proc/mdstat | grep '[[0-9]\/[0-9]]' | awk '{print $3}' | sed -n "$v p"`
                        disksok=`echo $diskno | sed 's/\[//g' | cut -d '/' -f1`
                        diskstotal=`echo $diskno | sed 's/\]//g' | cut -d '/' -f2`
                        if [[ "$disksok" -eq "$diskstotal" ]];then
                                echo "OK: Software RAID volume $volume configured in $raidtype is OK, with $diskno disks UP"
                                ((okswraid++))
                        elif [[ "$disksok" -lt "$diskstotal" ]];then
                                echo "CRITICAL: Software RAID volume $volume configured in $raidtype is CRITICAL, with $diskno disks UP"
                                ((koswraid++))
                        fi
                done

                if [[ $koswraid -eq 0 ]];then
                        exit 0;
                else
                        exit 2;
                fi
        else
                echo "UNKNOWN: No software RAID configured"
                exit 3;
        fi

elif [[ ! -z "$type" && "$type" =~ "fake" ]];then
        bin=`/usr/bin/which dmraid`
        if [[ ! -z $bin ]];then
                result=`$bin -s`
                disksno=`$bin -r | grep -v no | wc -l`
                disksok=`$bin -r | grep ok | wc -l`
                if [[ ! -z "$result" && "$result" =~ "ok" ]];then
                        echo "OK: RAID Status is OK, with $disksok/$disksno disks OK"
                        exit 0;
                elif [[ ! -z "$result" && "$result" =~ "no raid" ]];then
                        echo "UNKNOWN: no fake RAID configured"
                        exit 3;
                else
                        echo "CRITICAL: RAID Status is KO, with $disksok/$disksno disks OK"
                        exit 2;
                fi
        else
                echo "UNKNOWN: no dmraid binary found - please install dmraid"
                exit 3;
        fi

elif [[ ! -z "$type" && "$type" =~ "hardware" ]];then
        okraid=0;
        oksmart=0;
        koraid=0;
        kosmart=0;
        if [[ ! -z "$subtype" && "$subtype" =~ "--megaraid" ]];then
                bin=`/usr/bin/which MegaCli64`
                if [[ ! -z $bin ]];then
                        if [[ ! -z "$element" && "$element" =~ "--volume" ]];then
                                result=`$bin -LDinfo -Lall -aALL | grep State | awk '{print $3}'`
                                if [[ ! -z "$result" && $result =~ "Optimal" ]];then
                                        echo "OK: RAID Volume state is $result"
                                        exit 0;
                                else
                                        echo "CRITICAL: RAID Volume state is $result"
                                        exit 2;
                                fi
                        elif [[ ! -z "$element" && "$element" =~ "--physical" ]];then
                                diskno=`$bin -PDList -aALL | grep "S.M.A.R.T alert" | wc -l`
                                for (( d=1; d<=$diskno; d++ ))
                                do
                                        result=`$bin -PDList -aALL | grep "Firmware state" | sed -n "$d p" | awk '{print $3}' | sed 's/,//g'`
                                        if [[ ! -z "$result" && $result =~ "Online" ]];then
                                                echo "RAID Status for Physical Disk number $d is OK"
                                                ((okraid++));
                                        else
                                                echo "RAID Status for Physical Disks number $d is KO"
                                                ((koraid++));
                                        fi
                                done
                                for (( d=1; d<=$diskno; d++ ))
                                do
                                        result=`$bin -PDList -aALL | grep "S.M.A.R.T alert" | sed -n "$d p" | awk '{print $8}'`
                                        if [[ ! -z "$result" && $result =~ "No" ]];then
                                                echo "S.M.A.R.T Status for Physical Disk number $d is OK"
                                                ((oksmart++));
                                        else
                                                echo "S.M.A.R.T. Status for Physical Disks number $d is KO"
                                                ((kosmart++));
                                        fi
                                done
                                if [[ $koraid -eq 0 && $kosmart -eq 0 ]];then
                                        echo "OK: RAID and S.M.A.R.T Status for all Physical Disks is OK"
                                        exit 0;
                                elif [[ $koraid -eq 0 && $kosmart -ne 0 ]];then
                                        echo "CRITICAL: S.M.A.R.T Status for some Physical Disks is KO"
                                        exit 2;
                                elif [[ $koraid -ne 0 && "$kosmart" -eq 0 ]];then
                                        echo "CRITICAL: RAID Status for some Physical Disks is KO"
                                        exit 2;
                                elif [[ $koraid -ne 0 && $kosmart -ne 0 ]];then
                                        echo "CRITICAL: RAID and S.M.A.R.T Status for some Physical Disks is KO"
                                        exit 2;
                                fi
                        elif [[ ! -z "$element" && "$element" =~ "--battery" ]];then
                                result=`$bin -AdpBbuCmd -aAll | grep "Battery State" | awk '{print $3}'`
                                if [[ ! -z "$result" && $result =~ "OK" ]];then
                                        echo "OK: RAID Controller Battery state is OK"
                                        exit 0;
                                else
                                        echo "CRITICAL: RAID Controller Battery state is $result"
                                        exit 2;
                                fi
                        else
                                echo "UNKNOWN: please specify the element to check"
                                echo $usage;
                                exit 3;
                        fi
                else
                        echo "UNKNOWN: No MegaCli64 binary found - please install MegaCli64"
                        exit 3;
                fi

        elif [[ ! -z "$subtype" && "$subtype" =~ "mpt" ]];then
                modprobe mptctl
                bin=`/usr/bin/which mpt-status`
                bin2=`/usr/bin/which lspci`
                bin3=`/usr/bin/which daemonize`
                if [[ ! -z $bin ]];then
                        if [[ ! -z $bin2 ]];then
                                controller_status=`lspci | grep MPT`
                                if [[ ! -z $controller_status ]];then
                                        if [[ ! -z $bin3 ]];then
                                                controller=`$bin -p | grep id | awk '{print $3}' | sed 's/id=//g' | sed 's/,//g'`
                                                if [[ ! -z $controller ]];then
                                                        result=`$bin -i $controller | grep OPTIMAL`
                                                        if [[ ! -z "$result" ]];then
                                                                echo "OK: RAID Status is OPTIMAL"
                                                                exit 0;
                                                        else
                                                                echo "CRITICAL: RAID Status is DEGRADED"
                                                                exit 2;
                                                        fi
                                                else
                                                        echo "UNKNOWN: MPT Controller found but no RAID configured";
                                                        exit 3;
                                                fi
                                        else
                                                echo "UNKNOWN: No daemonize binary found - please install daemonize";
                                                exit 3;
                                        fi
                                else
                                        echo "UNKNOWN: Unable to find RAID Controller";
                                        exit 3;
                                fi
                        else
                                echo "UNKNOWN: No lspci binary found - please install lspci";
                                exit 3;
                        fi
                else
                        echo "UNKNOWN: No mpt-status binary found - please install mpt-status"
                        exit 3;
                fi

        else
                echo "UNKNOWN: please specify the RAID Controller type"
                echo $usage
                exit 3;
        fi
else
        echo "UNKNOWN: please specify the RAID type"
        echo $usage
        exit 3;
fi
exit 0

Lo usage parla chiaro: il primo argomento identifica, per l’appunto, la tecnologia RAID utilizzata sul sistema target. Il secondo ed il terzo argomento, invece, dovranno essere specificati solo nel caso in cui si abbia a che fare con un RAID di tipo hardware. Nella fattispecie, essi rappresentano, rispettivamente, la tipologia di chipset utilizzata dal controller e l’oggetto di interesse della nostra query, ovvero il volume, i dischi fisici oppure la batteria (tale parametro ha senso solo se il chipset è di tipo LSI MegaRAID).

Configurazione di Nagios

Come al solito, il primo step consiste nel definire un comando che utilizzi lo script (in gergo plugin) riportato in precedenza:

# 'check_local_raid' command definition
define command{
        command_name    check_local_raid
        command_line    $USER1$/check_raid $ARG1$ $ARG2$ $ARG3$
        }

tali direttive andranno opportunamente inserite all’interno del file /etc/nagios/objects/commands.cfg.

Successivamente si potrà procedere con la definizione del servizio che si occuperà del monitoraggio vero e proprio, da aggiungere alla configurazione dell’host target, in questo caso /etc/nagios/object/locahost.cfg:

define service{
        use                             local-service         ; Name of service template to use
        host_name                       localhost
        service_description             RAID Status
        check_command                   check_local_raid!--software
        }

A questo punto non ci rimane che ricaricare la configurazione di Nagios per rendere effettive le suddette modifiche:

[root@linuxbox ~]# service nagios reload

ed abbiamo finito.

Alla prossima.

Nagios: script bash per monitorare lo stato dei volumi RAIDultima modifica: 2017-02-20T16:27:38+00:00da nazarenolatella
Reposta per primo quest’articolo

Lascia un commento