Thursday, August 2, 2018

amazon web services - Automatically monitor new cloud servers using Open Monitoring Distro (OMD)?



I've been spending some time getting my head around using Nagios, Check_mk and some other very useful tools installed as part of the OMD package.




WATO is especially useful for administrating all of our static Windows and Linux based servers through a GUI once the check_mk agent is installed manually.



I wanted to ask what is the best way to automate this entire monitoring process? Or even if it can be done?



We will be using chef recipes to provision new servers on a regular basis and kill off others frequently. If we are to continue using Nagios / Check_mk then it's essential that the admin effort is minimal to track and monitor our infrastructure.



Many thanks for your help.
Steve


Answer



Highlevel, there are two ways:





  • Make chef write valid Check_MK config files (this has already been done by now), and have it trigger inventory + reloads via the WATO automation. This is probably more transparent.

  • Make Check_MK read the hosts from your CMDB (should you run a professional setup, there would be one...) or from the Chef config. This is feasible the Check_MK config allows you basically anything that Python allows you. So you could read data from LDAP, some API, Chef config, or a flat file. To me, it's the cleaner approach since it has a more direct "data" interface.



I think in the long run the first way is going to work out better for you anyway since it's more oriented towards WATO.
I would still pick the second one and hook into the EC2 vm list and such.



A hybrid is possible with i.e. some daemon listens in on events like VM creations and writes out config to the WATO readonly folder.




Note:
It would be highly stupid to not sanitycheck any such datasource. Just because some Infrastructure as Code nutcase adds a (infrastructure) bug and deletes 100% of your VMs from Chef they should not be immediately removed from monitoring.



Make sure it stays a little out of band.



A 2010-ish document about dynamic Check_MK interfacing could be found here:
https://geni-orca.renci.org/trac/wiki/OMDeventhandlers



It's really old but lays out the basic ideas well.




I've made a first proof of concept for a config-mgmt ---to ---- Check_MK interface. Not as nice as I would like it, but just limited by my speed/skill writing Python. :)



I'm using it with approx. non-cloud 70 servers now:
https://bitbucket.org/darkfader/nagios/src/461992c2c5452807a37838ca99fd92977fcf96e1/check_mk/ino2cmk/ino2cmk.py?at=default


No comments:

Post a Comment

linux - How to SSH to ec2 instance in VPC private subnet via NAT server

I have created a VPC in aws with a public subnet and a private subnet. The private subnet does not have direct access to external network. S...