Padrino - rspec - adding dynamic code to all controllers during testing · 5 February 2013, 13:34

We add in code that allows us to send mock parameters to controller actions via Capybara tests for all controllers while testing -so we can simulate session state ( for example - user being logged in ). In Rails you do this by re-opening ApplicationController in spec/spec_helper.rb and adding in a before_filter. In Padrino you can do this by adding custom code to app.rb in a before block - the before block is called for every controller action.
       configure :test do
          before do
            params.keys.each do |param|
              if param =~ /^mock_/
                mock_param = param.gsub(/mock_/, '') 
                session[ mock_param ] = params[ param ]
                logger.debug %{ #{mock_param} set to #{params[ param ]}} 

— Max Schubert



Class variables in Rspec tests - reloads will happen during testing! · 5 February 2013, 13:20

As is the case when a ruby application runs in a multi-process web container (phusion passenger for example), during your rspec test runs, classes can be reloaded – meaning any class-level variables that are set cannot be counted on across tests, even within the same describe block.

We have a few classes that act as API wrappers for external services that we like to be able to have off by default in our test environment and on by default in production – we initially used class variables for these, but found that the state was getting reset across runs.

Fix: move the initial states into config/application.rb and config/environments/* and use those to initialize the cattr_accessor definitions at the top of the classes.

— Max Schubert



Nagios deep dive: retention.dat and modified_attributes · 23 November 2010, 07:26

When Nagios core (the daemon, typically started by a script in /etc/init.d/) starts up, it follows a rather involved process to turn the configuration files and domain-specific language (DSL) contained within them into in-memory objects – the 10000 foot view of this process is:

modified_attributes tells Nagios which attributes of an object should be loaded into memory as Nagios reads object state from retention.dat; the code that uses this field (all DSL-related code is in the xdata/ directory of the source tree) uses bit-shifting to store and determine which attributes should be read into memory for an object and which should be ignored.

From include/common.h:

#define MODATTR_NONE                            0
#define MODATTR_CHECK_COMMAND                   512
#define MODATTR_NORMAL_CHECK_INTERVAL           1024
#define MODATTR_RETRY_CHECK_INTERVAL            2048
#define MODATTR_MAX_CHECK_ATTEMPTS              4096
#define MODATTR_CHECK_TIMEPERIOD                16384
#define MODATTR_CUSTOM_VARIABLE                 32768

The default value for modified_attributes is 0 – ignore all attributes from retention.dat that have counterpart constants in common.h

When an object’s state for the fields listed is changed as Nagios runs, Nagios changes the value of the modified_attributes field to include the constant that represents the field; this allows the retention.dat parsing code to know which attributes to read into memory as an object is parsed from retention.dat into memory when Nagios starts.

A common use case showing this process:

When these two actions are processed, Nagios core will then change modified_attributes to indicate that the state of the notifications_enabled and active_checks_enabled fields were changed from their default values by setting modified_attributes to 3, which is the result of code similar to this:

modified_attributes |= MODATTR_ACTIVE_CHECKS_ENABLED 

When Nagios is stopped, it serializes all objects from memory to disk – the modified_attributes attribute is one of the attributes written to disk.

Our team has taken the approach of writing out our own retention.dat files based on state for Nagios objects stored in a database as a part of our current distributed nagios implementation – knowing how modified_attributes works fixed a long standing bug in our code that was causing attributes for hosts and services that had been modified in-flight to be ignored when Nagios started – we hope this short article helps you avoid the same bug.

Special thanks to my managers Mike Fischer and Eric Scholz at Comcast (a great place to work as a developer!) for allowing me to share information learned while at work based on our use of open source software with the community – and special thanks to Ryan Richins for his work with me on uncovering the cause of this bug in our custom Nagios configuration distribution code.

— Max Schubert



Nagios Performance Tuning - use the RAM (but be careful!), Luke · 5 January 2010, 22:04

We found that migrating as many queues and files as we reasonably can within our Nagios architecture to RAM disks makes a huge difference with the performance of a large Nagios installation. We currently poll over 15k services on over 2k+ hosts in less than 5 minutes 24×7×365.

We use RHEL5; by default RHEL mounts /dev/shm as a RAM disk with 50% of physical RAM available to the partition.

Our opinion on using RAM disks for temporary storage is controversial; a number of users on the Nagios users and developers lists have told me that disks with big caches should be as fast as RAM as files are cached in RAM, but our experience has shown that nothing beats a RAM disk for a fast queue directory or file. Our experiences also taught us that when moving queues to RAM it is very important to also implement supporting code that ensures important data is persisted across reboots or can easily be re-created across reboots.

Our experience is based on machines with SCSI disks in RAID 0, 5, and 1+0 configurations.

Queues and files we moved to RAM that sped up our Nagios architecture noticeably (by over 40% in total):

Nagios (nagios.cfg)

Moving log_file, object_cache_file, and status file to RAM speed up the CGIs in a larger environment. Moving the temp_file, temp_path, check_result_path, and state_retention_file to RAM lowers the latency for Nagios in a larger environment.

We have also taken the radical steps of moving all configuration files into RAM as well as plugins. We use ePN extensively, every time Nagios goes to run an ePN plugin it checks to see if the plugin has changed. Moving plugins to RAM we noticed a speed up.

IMPORTANT NOTE – Do not move everything to RAM without putting in custom, periodic scripts or other processes that back up important files from RAM to real disk so that if the host crashes they can be quickly recovered or re-created!

SNMPTT (snmptt.ini)

The spool file for checks is a good one to move to RAM and speeds up processing.

PNP (npcd.conf and process_perfdata.conf)

The NPCD queue is another directory we moved to RAM and noticed a nice jump in processing time for NPCD.


Moving any of the above queues to RAM disks will increase the overall speed of your Nagios architecture; the Nagios-specific configuration changes make a very noticable difference but at the price of some additional supporting code to ensure the robustness of critical data. We developed this list over a period of 3-6 months of time, so take your time if you decide to implement any of the changes mentioned in this article; also make sure you have Nagios trending metrics in place beforehand so you can see what kind of difference the above changes make, if any, to your installation.

Special thanks to my managers Eric Scholz, Mike Fischer, and Jason Livingood for allowing us to share our experiences and knowledge with the general public, and extra special thanks to my teammates Ryan Richins and Shaofeng Yang for their work with me in creating an ever-changing and improving Nagios architecture that is stable and gives us incredible performance.

We are still hiring :), contact me if you are interested in working on a terrific team doing interesting and innovative work.

— Max Schubert



Updated Nagios::Plugin::SNMP and Nenm::Utils on Githhub (on CPAN this week) · 26 August 2009, 19:19

I have released version 1.2 of Nagios::Plugin::SNMP to Github:—plugin—snmp/tree/master

This release includes:

Additionally I have released an updated version of the Nenm::Utils module that I initially created for the Syngress Nagios book project I lead. This version includes:

This module is also available on the book site

My team at work uses both of these modules extensively to query several thousand SNMP-based agents every 5 minutes.

Special thanks to:

My teammates Ryan Richins and Shaofeng Yang for their extensive contributions to both of these modules.

My managers at Comcast, Mike Fischer and Jason Livingood, for allowing us to contribute code we have done at work back to the open source community.

Comcast is hiring! Our team is looking for a talented developer with systems administration experience to join our team. Let me know if you are in the northern Virginia area of the US and are looking for a fun and challenging place to work :).

— Max Schubert



Nagios Performance Tuning: Early Lessons Learned, Lessons Shared. Part 5 - Circular Dependency Checking · 6 August 2009, 12:36

NOTE – we are using Nagios 3.0.3, which does not have the very cool patch for the circular dependency checking algorithm recently introduced into the Nagios 3.1.x release tree.

Our startup times for our Nagios instances jumped dramatically today (more than 6x) due to some of our users adding large numbers of new services to their hosts that are associated with their hosts through the

service -> hostgroup -> host

relationship I have discussed often and that we make use of often. We always want our Nagios instance to start on a 5 minute interval as we push most of the performance data we get back from checks into a long-term trending data warehouse.

We also test every configuration release in an integration and test environment before doing a deployment.

With this in mind, we decided to try turning off circular dependency checking on startup for our production Nagios instances.

On one this reduced startup time from 763! seconds to 16 seconds; on the other startup times were reduced from 158 seconds to 6 seconds.

There you have it, a simple way to dramatically reduce startup times, but again, only do this if you test your configuration beforehand in an environment with circular dependency checking on.

— Max Schubert



Easy to use ruby library for interacting with Confluence - confluence4r · 31 July 2009, 12:43

I added a gemspec for the package to the bottom of the page if you want to build it as a gem in-house.

— Max Schubert



Why do I get an 'unitialized value' error message from Getopt/ when Nagios runs my perl-based plugin under ePN? · 25 July 2009, 10:48

Had this message while debugging an ePN-based script today:

**ePN /data/nagios/etc/customers/tean/project/plugins/ "Use of uninitialized value in pattern match (m//) at /usr/lib/perl5/5.8.8/Getopt/ line 848,".

Was very puzzled by this as i had never seen that error before, we run 20-30 or more ePN-based scripts, and obviously I don’t maintain that code so how could I have introduced a bug into it?

Answer: I didn’t. What i did do was define a custom attribute for a service but not put any spaces after the attribute in my service definition. E.g.

define command {
    command_name check_plugin_name
    command_line $USER10$/team/project/plugins/ \
    --check-interval $_SERVICE_PROJECT_CHECK_INTERVAL$ \
    --hostname $HOSTADDRESS$ \
    -p '$_HOST_SNMP_PORT$' \
    --snmp-version 2c \
    --rocommunity $_HOST_SNMP_COMMUNITY$ \
    --timeout $_HOST_PLUGIN_TIMEOUT$ \

Notice that at the end of the command line I reference $_SERVICE_PROJECT_WARN$. This style of custom attribute calling lets the user set a warning threshold definition the service definition if they want to, like so

define service {
    __project_warn -w my_threshold_specification

But if they don’t, no changes are needed to the command definition to let it work as the command does not require a warning threshold.

However I then defined the attribute like so in my service definition:

define service {
    __project_warn<-- end of line, no spaces!

This caused Nagios to substitute a null or some other non-printable character as the value of the attribute in the command line before executing it, which in turn got passed through to Getopt/ as an undefined option name.

The fix .. just add spaces and an empty string to the attribute in the service definition :)

define service {
    __snmp_port           161
    __project_warn        ''

Voila, no undefined option.

Could be a candidate for either a Nagios custom attribute value fix or a Getopt/ fix, I am thinking Getopt::Long should set an undefined option name to the empty string so that developers do not have to guard for this condition.

— Max Schubert



Nagios patch withdrawl: only send recovery escalation notifications for services if a problem escalation notification was sent · 24 July 2009, 13:16

Well, I hate to say it, but me oculpa, I had to withdraw the first attempt at the patch I did in an earlier article (which I have hidden for now to make sure others do not download it) that was supposed to fix escalation recovery notification behavior.

My first attempt at the patch was overly naive; if you downloaded it, please remove it from your installation as it will most likely not work for you. It does work for us, but our configuration is very unique and very different from how most people use Nagios.

I have a new version in place at my job and I will be releasing that version next week or the week after next. Why might you trust this new one after my poor first attempts?

My apologies if you downloaded and used the earlier patches; thankfully it will not corrupt data etc, just does not do what I promised it would do.

The current version is working for us and working with typical configurations as well I am just not going to repeat the same mistakes I made last time as I know how frustrating it is to back out code.

— Max Schubert



Are you an expert US citizen? · 19 June 2009, 14:59

Email from a recruiter this year included a request for the following skill:

— Max Schubert

Comment [2]