Presentation Pt3: Hosts And Services

Well getting this presentation up hasn’t exactly been flying along at sonic speeds but there’s been so many interesting deviations over the last few months that warranted posts and more yet to come… but for now I’m containing myself and getting back on track.

Part 3 should hopefully be a little more concise than previous installments, the methods for architecting this part of your solution are going to fluctuate much more depending on your business needs but there are still some easy considerations that can make your life much easier.

  • Minimize the content in the host configuration file and maximize the configuration in the host template file to make automation and on-going management easier. This is sort of a no-brainer really… but the more stuff in templates the better. Host-groups should probably also be defined in the host configuration, again for automation purposes.
  • Remember those “cg” contact groups we created in the last article? We can now attach those to our new host templates to assign what a user can see. You can also use multi-tenancy with templates as long as you assign the templates in the right order and use the additive flag, this is a more advanced feature though and isn’t covered here.
  • When you get around to creating host-groups, create them in a manner that will allow you to do meaningful service groupings. For example all windows servers are likely to have a C: and maybe D: drive that you will want to monitor, along with windows services that are common to all windows servers. More on this soon.

As I’ve used to demonstrate previously here’s my 30 second visio of how the objects would be linked.

Hosts

This as a simple config example of how one might accomplish the above from a more technical perspective:

define host {
  name srv-template
  alias Server host template
  check_command check_icmp!250.0,60%!500.0,80%
  max_check_attempts 3
  check_interval 10
  retry_interval 2
  check_period 24x7
  contact_groups cg-main
  notification_interval 60
  notification_period 24x7
  notification_options d,f
  notifications_enabled 1
  register 0
}

define host {
  host_name exchange01
  use srv-template
  alias Exchange server
  address exchange01
  parents switch001,switch002
  hostgroups srv-exchange, srv-windows
  icon_image exchange.png
  register 1
}

define hostgroup {
  hostgroup_name srv-windows
  alias Windows group
}

So now lets take a look at the service side of things. Services are a little bit different than the other sections… for services we don’t want to put as much information as humanly possible into the templates like we did previously. The easy answer recommendations for services are:

  • Don’t make your service definitions excessively specific to a host or subset of hosts. For the sake of lowering your administrative overhead and doing less work you want to be able to monitor as much as possible with each definition. In hyperbolic terms I’m saying don’t create a C: drive service definition for every single server.
  • Once you’ve created a service assign it to the relevant hostgroup and avoid connecting services directly to a host, this again makes it much easier to manage and track down problems. There are of course going to be situations where you have to assign a service directly to a host but minimize and label clearly when you have done that.
  • Service templates should only contain the bare information that is common to large quantities of services because in all likelihood your services are all going to vary quite wildly unlike hosts where you will have more predictable baselines. In my environment I only have about 6 service templates, having more actually increased the administrative overhead of making changes.
  • Service groups are an overly abused feature that get used without a purpose in mind. There’s one very, very good reason to use service groups and that is when you have an application or some other system where the monitoring is distributed across multiple hosts and you want to be able to view it’s health or set down time. One other purpose I would recommend using service groups is when you add a new service check to a large number of hosts and you want a contingency plan to prevent it creating an alert storm if it goes wrong. Outside of this don’t use this feature if you don’t have to.

Here’s another flow diagram for the configuration of services:

Services

And of course a configuration example:

define service {
  name main-service-template
  service_description main service template
  max_check_attempts 3
  check_interval 10
  retry_interval 2
  check_period 24x7
  notification_interval 60
  notification_period 24x7
  notification_options c
  register 0
}

define service {
  service_description Windows C: usage
  use main-service-template
  hostgroup_name windowsgroup01,windowsgroup02
  check_command check_nt!USEDDISKSPACE!-w 80 -c 90
  contact_groups cg-main,cg-main-SMS
  register 1
}

Your goal when it comes to designing your host and service architecture is to minimize the amount of administrative over-head you incur so that you can spend more time on doing “fun” work instead of baby sitting Nagios. It’s also important to remember that device life-cycle happens and keeping a simple logical setup makes it much easier to migrate hosts and services when the time comes.

Next time we will be looking at the art of service dependencies and the completed picture!

Links

Presentation Pt1: User Permissions

Presentation Pt2: Users and Contacts

Presentation Pt3: Hosts and Services

Presentation Pt4: The art of service dependencies

comments powered by Disqus