阿里云-云小站(无限量代金券发放中)
【腾讯云】云服务器、云数据库、COS、CDN、短信等热卖云产品特惠抢购

CentOS 6.4 x64 Nagios监控平台:监控Linux主机的CPU温度

114次阅读
没有评论

共计 4105 个字符,预计需要花费 11 分钟才能阅读完成。

Note: There is a file embedded within this post, please visit this post to download the file.
机房没有温度报警装置,我用此方法实现对机房温度的掌控,如果只有一台报警,则可认为单机故障,如果几台同时报警,则可认为机房空调出现了问题。

具体实现方法如下:

环境:被监控机:CentOS 6.4

1、安装硬件传感器监控软件 sensors
#yum install lm_sensors*

2、运行 sensors-detect 进行传感器检测

#sensors-detect ## 一路回车即可,此步我在虚拟机下报错,但在物理机上没有问题

3、运行 sensors 看是否能读取数据,如下像下面这样表示正常:

[root@rd02 ~]# sensors

coretemp-isa-0000

Adapter: ISA adapter

Core 0: +32.0°C (high = +76.0°C, crit = +100.0°C)

Core 1: +32.0°C (high = +76.0°C, crit = +100.0°C)

4、#vi /usr/local/nagios/libexec/check_cputemp ## 粘贴如下 #号之间的内容

##########################################################

#!/bin/sh

#########check_cputemp###########

#date : May 2011

#Licence GPLv2

#INSTALLATION

#the script need to install lm_sensors

#sensors’s output need like below format

#########################################

#coretemp-isa-0000#

#Adapter: ISA adapter#

#Core 0: +27°C (high = +85°C)#

#

#coretemp-isa-0001#

#Adapter: ISA adapter#

#Core 1: +25°C (high = +85°C) #

#########################################

#you can use NRPE to define service in nagios

#check_nrpe!check_cputemp.sh

# Plugin return statements

STATE_OK=0

STATE_WARNING=1

STATE_CRITICAL=2

STATE_UNKNOWN=3

print_help_msg(){

$Echo“Usage: $0 -h to get help.”

}

print_full_help_msg(){

$Echo“Usage:”

$Echo“$0 [-v] -m sensors -w cpuT -c cpuT”

$Echo“Sepicify the method to use the temperature data sensors.”

$Echo“And the corresponding Critical value must greater than Warning value.”

$Echo“Example:”

$Echo“${0} -m sensors -w 40 -c 50″

}

print_err_msg(){

$Echo“Error.”

print_full_help_msg

}

to_debug(){

if [“$Debug” = “true”]; then

$Echo“$*”>> /var/log/check_sys_temperature.log.$$ 2>&1

fi

}

unset LANG

Echo=”echo -e”

if [$# -lt 1]; then

print_help_msg

exit 3

else

while getopts :vhm:w:c: OPTION

do

case $OPTION

in

v)

#$Echo“Verbose mode.”

Debug=true

;;

m)

method=$OPTARG

;;

w)

WARNING=$OPTARG

;;

c)

CRITICAL=$OPTARG ;;

h)

print_full_help_msg

exit 3

;;

?)

$Echo“Error: Illegal Option.”

print_help_msg

exit 3

;;

esac

done

if [“$method” = “sensors”]; then

use_sensors=”true”

to_debug use_sensors

else

$Echo“Error. Must to sepcify the method to use sensors.”

print_full_help_msg

exit 3

fi

to_debug All Values are \”Warning:“$WARNING”and Critical:“$CRITICAL”\”.
fi

#########lm_sensors##################

if [“$use_sensors” = “true”]; then

sensorsCheckOut=`which sensors 2>&1`

if [$? -ne 0];then

echo $sensorsCheckOut

echo Maybe you need to check your sensors.

exit 3

fi

to_debug Use $sensorsCheckOut to check system temperature

TEMP1=`sensors | head -3 | tail -1 | gawk‘{print $3}’| grep -o [0-9][0-9]`

TEMP2=`sensors | head -4 | tail -1 | gawk‘{print $3}’| grep -o [0-9][0-9]`

SUM=$(($TEMP1 + $TEMP2))

TEMP=$(($SUM/2))

if [-z “$TEMP”] ; then

$Echo“No Data been get here. Please confirm your ARGS and re-check it with Verbose mode, then to check the log.”

exit 3

fi

to_debug temperature data is $TEMP

else

$Echo“Error. Must to sepcify the method to use sensors”

print_full_help_msg

exit 3

fi

######### Comparaison with the warnings and criticals thresholds given by user############

CPU_TEMP=$TEMP

#if [“$WARNING” != “0”] || [“$CRITICAL” != “0”]; then

if [“$CPU_TEMP” -gt “$CRITICAL”] && [“$CRITICAL” != “0”]; then

STATE=”$STATE_CRITICAL”

STATE_MESSAGE=”CRITICAL”

to_debug $STATE , Message is $STATE_MESSAGE

elif [“$CPU_TEMP” -gt “$WARNING”] && [“$WARNING” != “0”]; then

STATE=”$STATE_WARNING”

STATE_MESSAGE=”WARNING”

to_debug $STATE , Message is $STATE_MESSAGE

else

STATE=”$STATE_OK”

STATE_MESSAGE=”OK”

to_debug $STATE , Message is $STATE_MESSAGE

fi

echo“The TEMPERATURE“$STATE_MESSAGE”“-”The CPU’s Temperature is“$CPU_TEMP”℃ !”

exit $STATE

##########################################################

5、赋予上述脚本执行权限:

#chmod +x /usr/local/nagios/libexec/check_cputemp

6、配置 nrpe.cfg,添加如下一行:

command[check_cputemp]=/usr/local/nagios/libexec/check_cputemp -m sensors -w 38 -c 45

注意:以上六步均在被监控机上完成。

7、在 Nagios 服务器配置服务:

define service{

use generic-service

host_name <hostname>

service_description CPU Temperature

check_command check_nrpe!check_cputemp

}

保存后重启 nagios 服务即可。

Nagios 的详细介绍 :请点这里
Nagios 的下载地址 :请点这里

相关阅读

网络监控器 Nagios 全攻略 http://www.linuxidc.com/Linux/2013-07/87067.htm

Nagios 搭建与配置详解 http://www.linuxidc.com/Linux/2013-05/84848.htm

Nginx 环境下构建 Nagios 监控平台 http://www.linuxidc.com/Linux/2011-07/38112.htm

在 RHEL5.3 上配置基本的 Nagios 系统 (使用 Nagios-3.1.2) http://www.linuxidc.com/Linux/2011-07/38129.htm

CentOS 5.5+Nginx+Nagios 监控端和被控端安装配置指南 http://www.linuxidc.com/Linux/2011-09/44018.htm

Ubuntu 13.10 Server 安装 Nagios Core 网络监控运用 http://www.linuxidc.com/Linux/2013-11/93047.htm

更多 CentOS 相关信息见 CentOS 专题页面 http://www.linuxidc.com/topicnews.aspx?tid=14

正文完
星哥说事-微信公众号
post-qrcode
 
星锅
版权声明:本站原创文章,由 星锅 2022-01-20发表,共计4105字。
转载说明:除特殊说明外本站文章皆由CC-4.0协议发布,转载请注明出处。
【腾讯云】推广者专属福利,新客户无门槛领取总价值高达2860元代金券,每种代金券限量500张,先到先得。
阿里云-最新活动爆款每日限量供应
评论(没有评论)
验证码
【腾讯云】云服务器、云数据库、COS、CDN、短信等云产品特惠热卖中