LDAP-Configured Nagios
From NagiosCommunity
Contents |
Situation
In complex paranoid-secure situations, we end up having failover/redundant Nagios servers submitting data to clustered databases for history. Often, firewalls inside the datacenter block us from testing all services from a single Nagios server (pair). Because we have a number of Sysadmins, who are (wink) flawless at communicating precisely, we find that the config doesn't always match the existing hosts, nor the DNS and DHCP. We already use LDAP, and in six months, the only failures we've seen on LDAP are either one of our own processes locking the Primary (yeah, never run anything on the primary but the Primary Replication Manager itself) and we had an openldap-2.2.13 being throttled by a glassfish. So our situation is:
- Many Sysadmins (and hiring so many more)
- Configs spread across multiple services
- New servers would get DNS config but no Nagios
Referring to the diagram above, we have an (blue) LDAP Primary in a datacenter, a number of (blue) LDAP secondaries (typically a pair in each datacenter), and a (orange) pair of Nagios at each location looking internally at the services (green) provided by the datacenter. We also want to use some of the Nagios in one Datacenter to monitor the externally-accessible services of another datacenter going forward.
Mission
We want to make it easiest to configure a new server in DNS, DHSP (if used), and in Nagios
- same time
- same data values
- single LDAP object atomically updated (for ACID-properties due to inter-related/dependent config)
We want a pair of nagios servers easily configured identically
We want to federate our config so that losing a Nagios server is quick to replace
Other Options
We looked at SVN/CVS our config, but that still eaves us with SAs forgetting to configure Nagios; also, I (today) have to fix a config that is in error because the DNS confg doesn't match the Nagios config (the failing server ww1 is actually ww2)
We looked at trying to configure from MySQL, and either generating the config by a cheesy script, or make a converter but MySQL is more difficult to cluster than LDAP, it has fewer options (we use both syncrepl and slurpd-pushes based on security)
Current Solution
The current schema allows me to express a Nagios host using the the DNS config from nis.schema (ipHost, ipAddress, name), and services similarly. There are some issues I'm trying to work out, and I can detail for others' input if that's the best way forward. This current work is based on a 4-day obsession started 2008-04-12, so will show some gaps. As well, I'd like to reduce the repetition of the xdata/xoddefault.c and base/config.c files if that's acceptible.
Let me paste a graphviz to show the datamodel:
That's a bit complex, graphviz reders an OK job, and I've added some colouring to ease the tracing of references. In short, this is a conversion of an xdata file read in by a "cfg_file=filename" directive into an LDAP zone instead. The regular files are read in, but the "ldap_server" directives trigger the LDAP client:
... cfg_file=hostgroups.cfg resource_file=resource.cfg ldap_server=ldap://ldap1.west.example.com/dc=example,dc=com ldap_server=ldap://ldap2.west.example.com:3890/dc=example,dc=com status_file=status.dat ...
When the main config files are finished, the ldap servers are tried in order for the first that gives a response (this assumes that the servers all hold identical data, and are replicas, so the first hit should work rather than hitting all and aggregating the results). This allows the base/config.c to read in the LDAP-based common data. The LDAP object searched for is a "monitorGroup" where "monitorFQDN" matches a "gethostname()":
dn: cn=Private,ou=West,dc=example,dc=com cn: Private objectClass: monitorGroup monitorFQDN: nagios1.west.example.com. monitorFQDN: nagios2.west.example.com. nagios-user: nagios nagios-group: nagios check-external-commands: FALSE
Don't worry about the "cn=Private": we check both public and private services. A remote datacenter checks its own private services, and another datacenter's public services. For that reason, there would be a matching:
dn: cn=Public,ou=East,dc=example,dc=com cn: Public objectClass: monitorGroup monitorFQDN: nagios1.west.example.com. monitorFQDN: nagios2.west.example.com.
Host objects look like the following. Note the shared artifacts with the other schemas, to allow bind-sdb and ldap-dhcp to read their configs from the same objects:
dn: cn=dns1,ou=West,dc=example,dc=com cn: dns1 nagios-use: basic-server objectClass: monitoredHost objectClass: dNSZone objectClass: ipHost objectClass: top monitorGroup: cn=Private,ou=West,dc=example,dc=com ipHostNumber: 192.168.12.100 macAddress: 00:30:48:23:c1:fe zoneName: west.example.com relativeDomainName: dns1
Note that the macAddress and ipNumber will need a bit of (to be done) magic to convert to a "dhcpAddress: 192.168.12.100" and "dhcpHWAddress: ethernet 00:30:48:23:c1:fe", but placing it in a single LDAP record makes it so that the SysAdmin editing it has very few possibilities to make human-errors. Our SAs are usually working under-the-gun, pressured for results by project managers, and juggling a bazillion things at once towards a solution, so (based on combat stress) I can see the smartest dude making the silliest errors. I hope to reduce that chance.
Services are similarly-registered:
dn: cn=ping-dns1,ou=West,dc=example,dc=com cn: ping-dns1 host-name: dns1 nagios-use: local-service objectClass: monitoredService objectClass: top description: PING check-command: check-host-alive monitorMemberOf: cn=Public,ou=West,dc=example,dc=com
This is also copied directly from my AutoTest self-test; the "monitorMemberOf" is a synonym of "monitorGroup" attribute. The "host-name" matches the host above, and registers the service as we'd expect.
Another benefit is that when we bring-online another datacenter, we can "slapcat" the current database, copy out the Nagios stuff, edit it with a text-file, and "ldapadd" the whole chunk at once. LDAP's self-defense schema-checking will resist Objects that violate schema, and re can apply referential-integrity-checking overlays and uniqueness-checking overlays to avoid making errors during this process.
Next Steps
I plan to roll a test-image in my company if the DataCenter Boss allows (Hi Stephen).
Benoit -- who looked at this before -- has offered some test-data based on his work in GoSA
I'm looking or a contact to consider this work for merge. It has been produced in off-work time, so bears no IP encumbrance, and submission back to Nagios is the best way to share and improve. If instead I need to keep a running patch, well, I'll produce the RPMs :)
