User Tools

Site Tools


monitoring

This is an old revision of the document!


Monitoring with Monit

I evaluated different monitoring systems over the years, and finally settled upon http://mmonit.com/monit/. Monit is very easy to configure and is capable of reacting to error conditions. For example, if an HTTP server crashes, it can restart the system locally. All monitoring runs on the system to be monitored itself (hence its capability to restart daemons). The full documentation is available here: http://mmonit.com/monit/documentation/monit.html

In addition, by using http://mmonit.com/, one can have a central reporting host that collects the monitoring information. This is a commercial system which we can buy.

The monitoring of my nginx server is configured like this:

check process nginx with pidfile /var/run/nginx.pid
  start program = "/etc/init.d/nginx start"
  stop program  = "/etc/init.d/nginx stop"
  group server
  if failed host www.gonium.net port 80 protocol http
      and request "/monit/token" then restart
  if cpu is greater than 60% for 2 cycles then alert
  if cpu > 80% for 5 cycles then restart
  if totalmem > 256 MB for 5 cycles then restart
  if children > 16 then restart
  if loadavg(5min) greater than 10 for 8 cycles then stop
  if 3 restarts within 5 cycles then timeout

This configuration snippet checks

  • Whether the process exists
  • The use of various resources such as CPU, memory, load and number of children
  • Attempts to download http://gonium.net/monit/token (end-to-end test)
  • Can stop and start the daemon in case of any problem

Monitoring the mySmartGrid infrastructure

TODO: List of daemons to look at, error conditions etc.

monitoring.1310458268.txt.gz · Last modified: 2012/10/30 10:39 (external edit)