ASIC/FPGA Design and Verification Out Source Services
A perl script to monitor a linux OS operation.
-
A few days ago, this site fell for a few hours. It was the result of some bandwidth consuming commands, which I mistakenly run, from remote.
The "dieing" process was slow. Probably the swap was increasingly used and load average value became high.
-
Should I have rebooted the system, the shut down period would have been short. Or better, should a script to monitor some of the system parameters be existed, it could have been done automatically.
-
Such a perl script is under development. It, presently monitors the swap size as well as swap data in and data out rate in KB/sec, number of active processes on the system and load average.
-
The code of the script is shown below:
#!/bin/perl
open(FPW, ">>/home/pini/junk/Pini/sys_mon.txt") || die("open fail $o_name\n");
$flg=1;
while($flg == 1) {
$dateS=`date +%y%m%d_%H%M%S`; chomp($dateS);
#swap size
$cmd="free | grep Swap";
$r=`$cmd`;
chomp($r);
$r =~ s/Swap://;
$r =~ s/ ([0-9])/_\1/g;
$r =~ s/ //g;
$r =~ s/_/ /g;
$r =~ s/^ //g;
@a=split(/ /, $r);
#total used free
$used_p=($a[1]*100)/$a[0];
$used_p=int($used_p);
if($used_p > 75) {
chomp($used_p);
$flg=0;#quit and re-boot
#also check swap in and out KB/sec
$r=`vmstat`;
@a=split(/\n/, $r);
$a[2] =~ s/ *([0-9]*)/_\1/g;
@aa=split(/_/, $a[2]);
$si=$aa[7];
$so=$aa[8];
while( length($si) < 4 ) {$si=" " . $si;}
while( length($so) < 4 ) {$so=" " . $so;}
#common print
print FPW ("$dateS swap $used_p si=$si so=$so\n");
}
#load average
$cmd="cat /proc/loadavg";
$r=`$cmd`;
chomp($r);
@a=split(/ /, $r);
#1min 5min 15min
if($a[0] > 10 && $a[1] > 5) {
chomp($a[0]); chomp($a[1]);
print FPW ("$dateS load average $a[0] $a[1]\n");
$flg=0;#quit and re-boot
}
#number of active processes
$cmd="ps -aef | wc -l";
$r=`$cmd`;
chomp($r);
if($r > 800) {
print FPW ("$dateS number of active processes on the system $r\n");
$flg=0;#quit and re-boot
}
sleep(120);
}#while
close(FPW);
#shutdown -r now
|