Not being able to find a simple solution to the NCSA backlog problem I went and created a micro daemon that replaces NCSA with something a lot more high performance so instead of NCSA being the bottle neck, that role is back on nagios itself. Another fantastic feature of this NCSA replacement is that it emits service checks for itself so you can monitor how many service checks per second are being proxied and how many bytes/second that is - great if you have little bandwidth.
| Three nagios proxies in one setup showing their own status |
Slave 1 --.
./nagios_proxy 10.16.250.30 \ Master (10.16.250.30)
|----> ./nagios_proxy
|
Slave 2 ---/
./nagios_proxy 10.16.250.30
The daemons need to be started in the same directory as the nagios.cmd pipe which in this setup is /opt/nagios/var/rw
Define the commands in your nagios config to send check results via the proxy:
define command {
command_name send_service_check_proxy
command_line /opt/nagios/libexec/send_service_check_proxy $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATEID$ '$SERVICEOUTPUT$'
}
define command {
command_name send_host_check_proxy
command_line /opt/nagios/libexec/send_host_check_proxy $HOSTNAME$ $HOSTSTATEID$ '$HOSTOUTPUT$'
}
The send_host_check_proxy command itself:
#!/bin/bash /bin/echo "PROCESS_HOST_CHECK_RESULT;$1;$2;$3" > /opt/nagios/var/rw/nagios-remote.cmd
The send_service_check_proxy command:
#!/bin/bash /bin/echo "PROCESS_SERVICE_CHECK_RESULT;$1;$2;$3;$4" > /opt/nagios/var/rw/nagios-remote.cmd
Now to make things a bit fault tolerant I usually run the proxy from a script that just keeps restarting the proxy if it quits. This is important because the proxy will terminate if it determines that something has gone wrong knowing fully well that it will be restarted. This is useful in large outages since the system will almost always come back up by itself. Its also nice to have a log file to refer to after the fact. Its probably worth noting that this service script also takes care of the pipe that is used by the proxy so if you don't use this script then you will have to manage that yourself.
#!/bin/bash ulimit 1000000 while true; do ./nagios_proxy 10.16.250.30 >>/var/log/nagios_proxy.log 2>&1 sleep 2 done
If you are readhat inclined then the below service script will help keep the whole thing running smoothly across system restarts, logrotations, etc. Converting to something other than redhat is up to you.
!/bin/sh # # chkconfig: 345 99 01 # description: Nagios distributed high-performance proxy/aggregator # # File : nagiosproxy # # Author : Ryan Krumins (ryan.krumins@gmail.com) # # Changelog : # # 2009-05-12 Ryan Krumins# - initial implementation # # Description : Start and stops the Nagios proxy/aggregator # used to collect and distribute passive nagios checks # INST_DIR=/opt/nagios/var/rw PROXY=$INST_DIR/nagios_proxy PROXY_WRAP=$INST_DIR/run_proxy.sh PROXY_PIPE=$INST_DIR/nagios-remote.cmd # Sanity checks. [ -x $PROXY ] || exit 0 [ -x $PROXY_WRAP ] || exit 0 # Source functions library . /etc/init.d/functions RETVAL=0 start () { echo -n $"Starting nagiosproxy: " if [ -n "`/sbin/pidof -o %PPID nagios_proxy`" ]; then echo -n $"nagios_proxy: already running" failure echo return 1 fi cd $INST_DIR $PROXY_WRAP & >/dev/nul 2>&1 sleep 1 if [ -n "`/sbin/pidof -o %PPID nagios_proxy`" ]; then success else failure fi; echo return $RETVAL } stop() { killall run_proxy.sh killall nagios_proxy } restart() { killall nagios_proxy } case "$1" in start) start ;; stop) stop ;; restart|force-reload|reload) restart ;; status) echo "nagios_proxy: " `/sbin/pidof nagios_proxy` esac exit $?
Lastly, the important part, the proxy source code itself for the NCSA replacement. I posted this on pastebin since it displays much nicer there.
So there you have it, a NCSA replacement that handles thousands of service check results per second, monitors itself in a useful/meaningful way, and is also tolerant to faults and rarely needs any interaction regardless of what occurs in your environment. If you use this I would appreciate you comments. This is only a quickly thrown together hack but if its useful enough then I would consider working on feature requests.
Could you, please, post it on the Nagios Exchange?
ReplyDeletehttp://exchange.nagios.org
This will benefit the Nagios community. Thank you!
Hi Ludmil,
DeleteThanks for the suggestion!
I had actually already added this to the nagios exchange. If it becomes more popular I would consider packaging it up properly as well.
Downloading and testing. sounds like exactly what I was looking for :)
ReplyDelete