Quantcast
Channel: SE Stuff and the like... » iostat
Viewing all articles
Browse latest Browse all 3

Check_iostat.pl version 0.9.7

$
0
0

Previous version and detaild information
http://sysengineers.wordpress.com/2010/02/05/check_iostat-for-nagios-version-0-9-5/

Application Goal:
Check disk IO using a simple perl script and possibly the counters allready available in the linux /proc/ directory. The program must be able to run under Nagios and should return a ‘Nagios’ correct syntax to the prompt. This way the plugin should be usable for Linux users to monitor their Disk IO within Nagios. The program should also report back performance information in the ‘Nagios’ compatible syntax so that graphing is available in the ‘Nagios’ / Cacti / Centreon add-ons.

Changes
Fixed lacking devision in validated summed results
Added sprintf functions rounding the validated numbers with 2 dimentions
Fixed some declaration problems
Added -debug switch to enable the debugging options
Coded arround the used math::Complex functions and removed the module
Improved the debugging output (readability)
Small fixes in de syntax

Installation
1. Copy the code below to your clipboard (ctr+c or the copy-button inside the code field)
2. Logon to the linuxbox with the nagios or nrpe client.
3. browse to the plugin-dir usually cd /usr/local/nagios/libexec/
4. Create a new file vi ./check_iostat.pl
5. Press insert and paste the code inside (when using putty paste is done with a rightmouse click)
6. save the file (esc > : > rq > enter, when using vi)
7. give the file execute rights using chmod +x ./check_iostat
8. make nagios the owner chown nagios:nagios ./check_iostat
9. Test it using updatedb& ./check_iostat -d sd -dbu -dbuw 70 -dbuc 88 -kbs -kbsw 10000 -kbsc 50000 -p

Next configure a nagios command or use nrpe and have fun :)

#!/usr/bin/perl
# Written by Chris Gralike @ AMIS.
# Perl based check command to fetch and report the
# TPS (transactions per second) and IO wait times.
# Plugin uses iostat for opperation.
# Verion 0.9.7
#
# Changes post 0.9.7 >> 28-05-2010
# Line 298...303 Prevent devision by zerro else exit because no data was collected  - Bug reported by Epiq.
# Line 380       Correction of a type that prevented pref data of r/s w/s from being printed. - Bug reported by Epiq.
# Line 40        Added dm as possible device input used by the linux LVM. - Suggested by Epiq.
# Line 269       Extended the device if/pragmatch validation to match more devices - Added by Jean Ventura.
#
###########################
use Switch;
use warnings;

my($numArgs,
   $debug,	# Print debugging information.
   $DevType,	# Used to match a certain devicetype from the resulting IOstat rows.
   $IOBIN,	# Is used to store a path to the iostat binairy for execution.
   $Samples,	# Used to store the initial Samples returned by iostat.
   @SampleRows, # Used to store the rows generated by the splitted samples.
   $firstseen,  # Used to keep track of the found devices (IOstat might return a set of devices i.e sda, sdb, sdc etc.)
   $Items,      # Used in the foreach to store the row being parsed.
   @cols,       # Used to store the columns in a row after an split.
   $dev,	# Used to create a symbolic link to dynamicly create a var.
   $rqm,
   $val,
   $devtypes,
   $rws, $rws_warn, $rws_crit,
   $kbs, $kbs_warn, $kbs_crit,
   $awt, $awt_warn, $awt_crit,
   $svc, $svc_warn, $svc_crit,
   $devices, $itd,$v1,$v2,$v3,
   $v4,$v5,$v6,$v7,$v8,$v9,$v10,$v11,
   $dbu, $dbu_warn, $dbu_crit
   );

# Preparing to collect the dangerious user input.
# Here is a list of known device types. Please add any device you would like to monitor..
$devtypes=";sd;hd;dm;";
$numArgs = $#ARGV + 1;
$critical_global = 0;
$warning_global = 0;

if($numArgs gt '0'){
	for($i=0;$i<$numArgs;$i++){
	# Process our command line arguments and do some basic testing.
	# Could be make human save in the future.
	switch ($ARGV[$i]) {
		# Enable debugging.
		case '-debug'{
				$debug = 1;
			     }
		# Handle device type
		case '-d'    {
				$val=$ARGV[$i+1];
		 		if( (index($devtypes, $val)) gt '-1'){
					$DevType=$val;
					$i++;
		  		}else{
					print "Ivalid Disktype found. Typo?\n"; exit 1;
		  		}
			     }
		# Do we need to check rqm?
		case '-rqm'  { $rqm='1'; }
		# What is the warning treshold?
		case '-rqmw' {
				$val=$ARGV[$i+1];
				# Is the value nummeric?
				if($val=~m/[0-9]*/){
					$rqm_warn= int $val;
					$i++;
				}else{
				# Possible type?
					print "Non Numeric value used in rqmw, typo? \n"; exit 1;
				}
			     }
		case '-rqmc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rqm_crit= int $val;
					$i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rqmc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check rws?
		case '-rws'  { $rws='1'; }
		case '-rwsw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rws_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rwsw, typo? \n"; exit 1;
                                }
			     }
		case '-rwsc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $rws_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in rwsc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check kbs?
		case '-kbs'  { $kbs='1'; }
		case '-kbsw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $kbs_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in kbsw, typo? \n"; exit 1;
                                }
			     }
		case '-kbsc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $kbs_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in kbsc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check awt?
		case '-awt'  { $awt='1'; }
		case '-awtw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $awt_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in awtw, typo? \n"; exit 1;
                                }
			     }
		case '-awtc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $awt_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in awtc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check svc?
		case '-svc'  { $svc='1'; }
		case '-svcw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $svc_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in svcw, typo? \n"; exit 1;
                                }
			     }
		case '-svcc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $svc_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in svcc, typo? \n"; exit 1;
                                }
			     }
		# Do we need to check dbu?
		case '-dbu'  { $dbu='1'; }
		case '-dbuw' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $dbu_warn= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in dbuw, typo? \n"; exit 1;
                                }
			     }
		case '-dbuc' {
				$val=$ARGV[$i+1];
                                # Is the value nummeric?
                                if($val=~m/[0-9]*/){
                                        $dbu_crit= int $val;
                                        $i++;
                                }else{
                                # Possible type?
                                        print "Non Numeric value used in dbuc, typo? \n"; exit 1;
                                }
			     }
		# performance data. Might make the string human unreadable.. ow well, no loss there <img src="https://s-ssl.wordpress.com/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley">
		case '-p'    { $prf='1'; }
		# Print full messages per device overview.
		#case '-m'    { $fms='1'; }
		# Most used help switches.
		case '--help'{ USAGE(); }
		case '-h'    { USAGE(); }
	}
	}
	# Check if the basic requirements are met.
	if(!($DevType) ||  !(($rqm || $rws || $kbs || $awt || $svc || $dbu))){
		print "Minimal requiremens, device and checktype are not met\n";
		USAGE();
	}
}else{
	# No input was given. Show the Usage();
	USAGE();
}

# Locate the IOBin binairy needed to fetch the stats.
chomp($IOBIN=`which iostat`);

if( !(-f $IOBIN) || !(-x $IOBIN)){
	print "A working iostat command is needed for this script to work \n";
	print "Also make sure the sysstat service is running! /etc/init.d/sysstat \n";
	exit 1;
}

if($DevType){
	# IF IOStat is found, lets collect some data.
	chomp($Samples=`$IOBIN -d -x -k 1 5 | grep $DevType`);
}else{
	print "Please select a valid devicetype \n";
	exit 1;
}

# Break the samples up in lines so we can evaluate them.
# The first set of samples are avg. values counted from boot.
# We need to discard them and collect the remaining samples.
@SampleRows=split(/\n/, $Samples);

# Firstseen is used to track de device names and to skip the first itteration of iostat that contains
# avg stats counted from system boot time, stats we cant use here sadly <img src="https://s-ssl.wordpress.com/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley">
$firstseen='';
$CT=0;
$Devices=0;

# Print Debugging information
if($debug){
	    print "### Data collected ###\n";
	    print "dev|rrqm|wrqm|r/s|w/s|rKB/s|wKB/s|rq-sz|qu-sz|await|svctm|util%|\n";
}

foreach $Items (@SampleRows){
	 # Break the latter up in usable columns.
        @cols=split(/\s+/,$Items);
	 # We only want a certain device type. So lets match what we know.
	 # Disks usualy have a prefix for example (scsi = sd) its set using
	 # the DevType var.
         if($cols[0]=~m/$DevType-[0-9]$/ || $cols[0]=~m/$DevType[a-z]$/ || $cols[0]=~m/$DevType[a-z][0-9]$/){

		$dev=$cols[0];
		if((rindex $firstseen, $dev) gt '-1'){
			#Declare new $$
			#my @$dev;
			# Store the collected data in the correct (dynamic) vars.
			$$dev[1]+=$cols[1];	# rrqm/s   Read Requests Merged per Second.
			$$dev[2]+=$cols[2];	# wrqm/s   Write Requests Merged per Second.
			$$dev[3]+=$cols[3];	# r/s      Number of read requests issued per Second
			$$dev[4]+=$cols[4];	# w/s      Number of write requests issued per Second
			$$dev[5]+=$cols[5];	# rKB/s    Number of Kilobytes read per Second.
			$$dev[6]+=$cols[6];	# wKB/s    Number of Kilobytes written per Second.
			$$dev[7]+=$cols[7];	# Avgrq-sz Avarage size (in sectors) of the issued requests.
			$$dev[8]+=$cols[8];	# Avgqu-sz Avarage Queue length of the requests issued.
			$$dev[9]+=$cols[9];	# Await	  Avarage wait time in ms for IO requests to be served.
			$$dev[10]+=$cols[10];	# svctm    Avarage service time in ms for IO requests that where issued.
			$$dev[11]+=$cols[11];   # %util    Precentage of CPU time during IO requests (bandwidth util), saturation at 90~100%
			$CT++; # Add a new itteration to the count
			# Print some debugging vars if requested. to show the data is collected.
			if($debug){
				    print "$dev|$cols[1]|$cols[2]|$cols[3]|$cols[4]|$cols[5]|$cols[6]|$cols[7]|$cols[8]|$cols[9]|$cols[10]|$cols[11]|\n";
			}
		}else{
			$Devices++;
			$firstseen.="$dev;";
		}
	}
}

# Prevent $itd (itterations / disk) from becomming zerro and exit when no devices are found.
# on line 299.
if($Devices > 0){
	$itd = ($CT / $Devices);
}else{
	print "No performance data was captured. Please check if the device name is correct\n"; exit 1;
}

# Print debugging information
if($debug){
	print "###Devices Counted###\n";
	print "Number of devices : $Devices\n";
	print "Number of itterations per device : $itd\n";
	print "Total Number of Itterations : $CT\n";
}
# Lets collect the device information from the firstseen var
# and start processing it for some perf check/data
# Lets also recycle some previously used vars for this.
@cols=split(/;/,$firstseen);
foreach $Items (@cols){
	# Items now contains the devicenames needed to access the data again.
	# We now need to check them against some basic tresholds
	# Print a nice table with the calculated values when we are in debug.

  	# Print debugging information
	if($debug){
		   print "### Counted Values ###\n";
		   print "$Items|$$Items[1]|$$Items[2]|$$Items[3]|$$Items[4]|$$Items[5]|$$Items[6]|$$Items[7]|$$Items[8]|$$Items[9]|$$Items[10]|$$Items[11]|\n";
        }

	#What do we want to check against a treshold?
	#First check the selection if any.
	if($rqm || $rws || $kbs || $awt || $svc || $dbu){

		#Set the counts to zerro
		$critical_state='0';
		$warning_state='0';
		$ok_state='0';
		# Devide
		$round="%.2f";
		$v1 = sprintf($round, ($$Items[1] / $itd));
        $v2 = sprintf($round, ($$Items[2] / $itd));
		$v3 = sprintf($round, ($$Items[3] / $itd));
        $v4 = sprintf($round, ($$Items[4] / $itd));
		$v5 = sprintf($round, ($$Items[5] / $itd));
        $v6 = sprintf($round, ($$Items[6] / $itd));
		$v7 = sprintf($round, ($$Items[7] / $itd));
        $v8 = sprintf($round, ($$Items[8] / $itd));
		$v9 = sprintf($round, ($$Items[9] / $itd));
        $v10 = sprintf($round, ($$Items[10] / $itd));
        $v11 = sprintf($round, ($$Items[11] / $itd));

		# Requests Merged per second.
		if($rqm){
			# Critical
			if(($v1 >= $rqm_crit) || ($v2 >= $rqm_crit)){
				$critical_state+='1';
			# Warning?
			}elsif(($v1 >= $rqm_warn) || ($v2 >= $rqm_warn)){
				$warning_state+='1';
			# Ok
			}else{
				$ok_state+='1';
			}
			# Add the counters to the performance vars
			$perf.="$Items-rrqm/s=$v1; $Items-wrqm/s=$v2;";
		}
		# Reads / Writes per second.
		if($rws){
			if(($v3 >= $rws_crit) || ($v4 >= $rws_crit)){
	                    	$critical_state+='1';
             		# Warning?
               	 	}elsif(($v3 >= $rws_warn) || ($v4 >= $rws_warn)){
                        	$warning_state+='1';
                	# Ok
                	}else{
                        	$ok_state+='1';
                	}
			# Add the counters to the performance var.
			$perf.="$Items-r/s=$v3; $Items-w/s=$v4; ";
		}
		# KB Read/Writes per second.
		if($kbs){
			if(($v5 >= $kbs_crit) || ($v6 >= $kbs_crit)){
	                        $critical_state+='1';
	                # Warning?
       	         	}elsif(($v5 >= $kbs_warn) || ($v6 >= $kbs_warn)){
                        	$warning_state+='1';
                	# Ok
                	}else{
                       	 	$ok_state+='1';
                	}
			$perf.="$Items-rKB/s=$v5; $Items-wKB/s=$v6; ";
		}
		# Avarage wait time
		if($awt){
			if(($v9 >= $awt_crit)){
                        	$critical_state+='1';
                	# Warning?
                	}elsif(($v9 >= $awt_warn)){
                        	$warning_state+='1';
                	# Ok
                	}else{
                        	$ok_state+='1';
                	}
			$perf.="$Items-await=$v9; ";
		}
		# Avarage service time issuing time
		if($svc){
			if($v10 >= $svc_crit){
       	                	$critical_state+='1';
                	# Warning?
                	}elsif($v10 >= $svc_warn){
                        	$warning_state+='1';
                	# Ok
                	}else{
                        	$ok_state+='1';
                	}
			$perf.="$Items-svctm=$v10; "
		}
		# Disk bandwidth Utilization
		if($dbu){
			if($v11 >= $dbu_crit){
                	        $critical_state+='1';
              	  	# Warning?
               	 	}elsif($v11 >= $dbu_warn){
                        	$warning_state+='1';
                	# Ok
                	}else{
                        	$ok_state+='1';
                	}
			$perf.="$Items-util=$v11%; ";
		}
	}else{
		print "At least select a value to measure..\n";
		exit 1;
	}
	# Print Debugging information about the validated values.
	if($debug){ print "### validated Devisions ###\n";
                    print "$Items|$v1|$v2|$v3|$v4|$v5|$v6|$v7|$v8|$v9|$v10|$v11|\n";
	}

	# Create a messages var.
	$mgs.="$Items=O:$ok_state,W:$warning_state,C:$critical_state; ";

	# Track the global state (1 crit 1 warn)
	if(($critical_state gt '0') || ( $critical_global gt '0')){
		$critical_global+='1';
	}
	if(($warning_state gt '0') || ( $warning_global gt '0')){
		$warning_global+='1';
	}
}

# Compose a nice nagios output.
if($critical_global >= '1'){
	print "CRITICAL:";
	$exit=2;
}elsif($warning_global >= '1'){
	print "WARNING:";
	$exit=1;
}else{
	print "OK:";
	$exit=0;
}
# Print the remainder, the most important data was processed.
print $mgs; if($prf){ print "|$perf"; } print "\n";
exit $exit;

###Subroutines
sub USAGE{
	print "
                Usage : $0 -d [Dev] [options]

		-p Print performance data about the measured samples.

		-d {grep string used on IOstat}
		examples;
                 sd     #All scsi devices.
                 hd     #All Cdrom devices.
		 sda	#Only device sda

                [Available Measurement Options]
                -rqm -rqmw val -rqmc val        # read/write merged             [#]
                -rws -rwsw val -rwsc val        # read/write per second.        [s]
                -kbs -kbsw val -kbsc val        # KBs read/written per second.  [s]
                -awt -awtw val -awtc val        # Avarage IO wait time.         [ms]
                -svc -svcw val -svcc val        # Avarage service IO wait time. [s]
                -dbu -dbuw val -dbuc val        # Disk utilization              [%]\n";
        exit 1;
}


Tagged: 0.9.7, avgqu-sz, avgrq-sz, await, centreon, changes, check, check_io, check_iostat, Chekc, Chekc_IOSSTAT, disk, disk io, disk wait, Disk;Utilization, diskIO, for, how, io, iostat, Linux, measure, measurement, nagios, OEL, Oracle Enterprise, Oracle Enterprise Linux, perf, performance, performance data, performance stats, plugin, r/s, Red Hat, Redhat, rKB/s, rrqm/s, Script, service, stat, svctm, Time, times, to, util, validate, values, Version, w/s, wait, wKB/s, wrqm/s

Viewing all articles
Browse latest Browse all 3

Latest Images

Trending Articles





Latest Images