Sending email from Perl: Mail::Sendmail · 19 December 2008, 19:33

I always forget which mail-related modules on CPAN are the easiest and most flexible for sending mail from a perl program. Several times now I have gone to CPAN, re-found Mail::Mailer, been enticed by it’s features, downloaded/installed it, and then had troubles getting it to work in ‘smtp’ mode.

I then download Mail::Sendmail and am up and running in minutes. The only feature Mail::Sendmail does not support that would make it the best (in my opinion) of the perl mail modules is SMTP AUTH.

— Max Schubert



Nagios Performance Tuning: Early Lessons Learned, Lessons Shared: Part 2 · 31 October 2008, 17:29

One of the first questions to ask your customer when designing a Nagios implementation should be “how many devices and services will we be monitoring?” It is important to ask this question early on in the process as the answers will affect how you design your Nagios-based system.

Another important question to ask is whether the system will be used to gather long-term (months/years) trending information or not. If it will be used as an ingest system for long term trending information, then timing becomes important. Making sure your service check intervals are consistent ovr time is critical to having the metrics it gathers have value for your organization / customer.

Why? Isn’t 5 minutes always 5 minutes to Nagios? Imagine you have a 5 minute metric – if, over time, that 5 minute metrics’ scheduling slips constantly forward or backwards from the original 5 minute intervals you schedule it for because of configuration decisions that cause Nagios to pause or ‘fall behind,’ you will end up with gaps in metrics and intervals that are hard to compare against each other. For example, if your original schedule for a metric is

0 5 10 15 20

and it then over the course of time it slips to

8 13 18 24 28

now your hour to hour comparisons are skewed, and if the scheduling skew continues, eventually you will have gaps in metrics.

So, given the above two questions:

You have some early architecture decisions to make, so your first priority should be to spend generous amounts of time reading and understanding the comprehensive online Nagios documentation. The Nagios documentation includes useful information on how to prepare for a larger installation, architecture patterns to follow when designing your systems, and very good information on Nagios configuration parameters that will help keep your systems executing checks quickly without becoming overwhelmed.

If your Nagios system will be trending hundreds or thousands of devices with thousands of service checks, you should think about having your Nagios poller and Nagios reporting / graphing functions exist on different servers. If you can, dedicate a second server to trending and notification and, if you have the luxury, use a third server just for notifications. The less I/O strain you put on your master poller, the more likely it will be able to hit whatever performance expectations you have.

Using a second server to offload trending and reporting also helps ensure that all performance data designated for trending actually make it to whatever graphing package you use and then to graphs.

On the other hand, if your system will only be used for fault management (not trending), you will be able to use less expensive hardware and will not necessarily require the expense or complexity of a multi-server setup. Same goes for cases in which you are monitoring a few hundred services on 50-100 servers.

Nagios does a lot of fork()ing when it runs service checks, so your Nagios master poller should have generous RAM and at least two CPUs. I have not come up with sizing formulae yet nor I have I found a sizing calculator for Nagios, but when I find one or figure out general rules of thumb I will post them.

Your reporting / trending server will experience high levels of disk I/O activity, so a generous amount of RAM and SCSI disks in RAID 1+0 (or 6, 0+1) is highly recommended.

Also, if you can avoid it, do NOT use VMWare or other BIOS-emulating virtual machine technology for your Nagios instances .. they generally will not be able to handle the fast processing needs a large Nagios installation requires and some virtualization technologies have problems with time sync, which is a huge deal killer for Nagios’ scheduling.

Next blog entry in this series will focus in on the Nagios master polling server.

Special thanks to Mike Fischer, my manager at Comcast, for allowing me to share my experiences at work online; special thanks to Ryan Richins, my talented teammate, for his hard work with me on our Nagios system. We are looking for another developer to join our team in the NoVA area; write me at if you are interested.

— Max Schubert



Nagios Performance Tuning: Early Lessons Learned, Lessons Shared: Part I · 30 October 2008, 16:48

This is the start of a short series of articles on Nagios 3.x performance tuning. If you have not read the Nagios performance tuning guide, please do so before reading this series of articles. Everything I discuss in these articles was done after applying the well thought-out, useful tips contained in the online Nagios documenation.

My team mate and I just went through a round of tuning our pre-production instance of Nagios, gathering base performance information and data to allow our team to give our management reasonable capacity estimates for how many services and hosts we can monitor in our environment with Nagios.

Pre-production hardware is HP DL185 (2x of these servers in use):

Nagios configuration:

Nagios poller:

Nagios report server:

For initial testing and tuning we are polling ~ 250 hosts with a total of ~ 1800 checks, all checks are SNMP, all scheduled at 5 minute intervals. Some are gets, some are summarizations of walks.

We are using PNP for graphing (NEB module mode, run via inetd), RRD updates happen on a second server dedicated to reporting and visualization.

At the beginning of our tuning adventure we were seeing:

This would barely be ok if we were just doing fault management (barely), but we want to send all perfdata not only to PNP but to a large time series warehouse db another team maintains. This meant we needed 5 minute samples to stay close to the same intervals over time as the large time series database stores raw samples for years and many other teams pull data from it for graphing, reports, and other analysis.

After two weeks of tuning we have reduced our check execution time (all 1800 checks!) to < 60 seconds, with an average scheduling skew of just 7 seconds at the end of 24 hours with our tuned configuration in place. All performance data is successfully being graphed by PNP as well. Our current configuration does this without knocking over either our Nagios polling server, our PNP server, or the hosts we are polling .. and we have room to poll many more services and hosts using the same two servers.

How did we get from start to finish? More science than art :p. I am usually a very intuitive developer but this time my teammate and I found we had to take a more scientific approach .. and it worked.

Stay tuned as I will be posting a series of short articles on my blog about Nagios performance tuning and scaling.

Special thanks to Mike Fischer, my manager at Comcast, for allowing me to share my experiences at work online; special thanks to Ryan Richins, my talented teammate, for his hard work with me on our Nagios system. We are looking for another developer to join our team in the NoVA area; write me at if you are interested.

— Max Schubert



First Nagios 3 Enterprise Monitoring Book Review! · 28 September 2008, 13:50

Finally, someone reviewed our book. Wee (and a huge thanks to the reviewer!!

— Max Schubert

Comment [1]


CSS is still for Tweakers · 27 September 2008, 13:49

I have been doing a lot of work with CSS lately, and while it is much more enjoyable to work with now than it was 5 years ago, it still sometimes sends me down the tweaker paths that Javascript used to send me down in the late 90s and 2000-2001.

Inheritence with CSS is interesting and confusing to me at times, it is making more sense to me now, but sometimes the relationships between global defaults, element-specific overrides, custom classes and IDs and then built-in element overrides and the combinations of all the above drive me nuts. That is more my problem than CSS though :p.

Cross browser compatibility with CSS is the place that currently tweaks me to no end. While more browsers implement CSS2 well, IE and Mozilla still render things differently, differently enough to send me down the path of 5-6 hours of tweaking to get a layout looking the same on both even if I am doing the bad thing of using absolute layouts to make my layout easier to implement. Spacing between elements, margins, whitespace differences, all aspects of CSS design and web page layout that have to be treated very carefully … or one ends up tweaking all day with very little positive impact.

Javascript libraries like scriptaculous, jQuery, and DOJO are making Javascript almost a no-brainer these days :), hopefully the same will happen with CSS .. I have been referred to some projects that seem to be going in the right direction for this .. 960 Grid is a good example of one.

— Max Schubert



Nagios and JSON with statusjson.cgi: a winning combination! · 27 September 2008, 13:35

Yann JOUANIN’s new JSON output CGI for Nagios called Nagios2JSON has some seriously important positive impacts to the utility of Nagios as glue for an organization’s monitoring infrastructure.

I have been using the JSON CGI to build a custom web interface to Nagios to help convince some important people I work with that Nagios is more flexible and easier to mold without hacking than any other open source fault management framework available today. Working with JSON lets me really just focus on my front end and just treat Nagios as a data source .. really really cool!

Some of the limitations of the current implementation (project is very young)

I really cannot say enough good things about this add on and I really hope Yann continues to work on this project.

— Max Schubert

Comment [2]


Freezing Ruby Data Structures Recursively · 15 September 2008, 10:18

Flavorrific published a monkey patch that extends Object with a deep freeze method that freezes child arrays and hashes of a data structure so that they aren’t accidentally changed (for example, configuration data from YAML).

I have extended his work in the following ways:

class Object

# Define a deep_freeze method in Object (based on code posted by flavorrific # # that will call freeze on the top-level object instance the method is # called on in as well as any child object instances contained in the # parent. This patch will also raise an IndexError if keys from # ‘deeply frozen’ Hashes or Arrays are accessed that do not exist.

def deep_freeze

# String doesn’t support each if (self.class != String) && (self.respond_to? :each) each { |v| v.deep_freeze if v.respond_to?(:deep_freeze) } end # Deep freeze instance variable values if self.kind_of? Object self.instance_variables.each { |v| iv = self.instance_variable_get(v) iv.deep_freeze self.instance_variable_set(v, iv) } end if self.kind_of? Hash instance_eval(<<EOF) def default(key) raise IndexError, “Frozen hash: key ‘\#{key}’ does not exist!” end EOF end # Prevent user from accessing array elements that do not exist. if self.kind_of? Array instance_eval(<<EOF) def at(index) self.fetch(index) end def [](arg1, arg2 = nil) results = if ! arg2.nil? # Start index and end index given arg1.upto(arg1 + arg2) { |index| results << self.fetch(index) } else if arg1.kind_of? Range # Range passed in arg1.each { |index| results << self.fetch(index) } else results << self.fetch(arg1) end end results end EOF end # Freeze the current structure freeze end


— Max Schubert



You Can Code It - (partial parody of "You Can Do It" by Ice Cube) · 13 September 2008, 08:14

Code Geek, baby …
’09, baby …
I’m online, baby …
all the time, baby …

You can do it, put your code into it!
I can do it, put requirements to it!

You can do it, put your code into it!
I can do it, put requirements to it!

Put your code into it!
Put requirements to it!

Click Click Zoom
See me typing out these code sheets

Codin out a framework
like a freakin nerdy athalete

— Max Schubert



Spore: 15 Easy Steps To Getting the Game Running · 8 September 2008, 02:33

I like games but rarely have time for them. I also usually do not get a game when it is first released because of the bugs that generally accompany any first release.

I have, however, been anticipating Spore (EA) for almost two years. So last night I decided to not wait 3 weeks for the local Best Buy to get more boxed copies in and instead I purchased and downloaded the game from EA.

After the 2 hour download I ran the installer and then started the game. Screen goes black, resizes to 1024×768, EA logo appears and then poof, crashes. Try again, same story.

So, 2 hours later and a lot of EA forum trawling, I get the game to run. Here are the steps I had to take on my Windows XP, 3.0 Ghz Shuttle (1 GB RAM) with a Radeon 5500 (256 MB) video card to get the game to work:

It worked. Easy :p.

— Max Schubert



Javascript - now with more fun! · 16 August 2008, 11:16

I have been doing some javascript development for web UIs for the first time in years and wow have things changed for the better.

Integrating Javascript into an application is now easier than ever, thanks to the large number of open source Javascript libraries available on the Internet. Each framework I have tried provides a nice, object-oriented API that includes visual effects, event libraries that allow a developer to add events to pages without mixing Javascript and HTML together, and helper routines to make DOM manipulation and parsing very very easy.

Jason Seifer gives a very good talk on separating web UI display and formatting from Javascript functionality. Integrating Javacsript into web UIs in this manner has been given the name “Unobtrusive Javascript.” If you have used any MVC-style frameworks you will easily pick up this technique and it really does separate HTML and Javascript in a very nice way.

The frameworks I have tried out so far are:

I like them all; so far I find the event API (which has been my focus lately) of jQuery and Moo Tools to be the nicest of the five.

All hide browser differences from the user, something that used to make simple tasks take hours and hours to implement properly.

Dojo and Moo Tools allow you to roll your own distribution of their tool from their web sites, which is a neat feature. Unfortunately a number of the Dojo examples would not work on my Firefox 3.0.1 browser, threw javascript errors, this is something I am sure they will fix in the near future.

All of the libraries offer “minified” and compressed versions of their libraries, of the list above I believe Moo Tools comes in at the smallest when uncompressed when downloading their full package.

— Max Schubert



Older Newer