Detect server side problems using Nagios plugins and the Load Impact Server Metrics Agent

Just recently we launched our cloud-based Server Metrics Agent – a function that allows you to collect information about what’s happening internally on your servers while your website or -application is being load tested. Installing the Server Metrics agent on one of your machines will immediately let you see how much CPU, memory and network bandwidth the server is using throughout the load test.

SMA

This can, of course, be very useful when looking for bottlenecks, but sometimes you want to know more. For example, you might be using a database software such as PostgreSQL and suspect that it is running out of some PostgreSQL– internal resource, such as client connections, causing a bottleneck for your web server in handling client requests. In this case, you will not notice any problems just by looking at, for example, the CPU or memory usage on the physical server where PostgreSQL is running. Instead, you must communicate directly with PostgreSQL and ask it how it’s doing. You want it to tell you how many connections its database clients are using and what the maximum limit is.

When we created our Server Metrics agent, we realized people would want to collect more specialized metrics like this. Not just the standard physical server metrics (e.g. CPU usage, memory usage, disk usage, etc) but we were confronted with a big problem; there are thousands of different systems, platforms, applications from which you might want to collect performance metrics in order to detect bottlenecks, and each of them communicates in different ways. We couldn’t possibly write monitoring code to support every one of them.

Luckily, we have a bit of experience with uptime monitoring, and we knew that the very popular open-source monitoring solution Nagios has a simple and flexible plugin system that is easy to interface with. We came up with the idea of designing our Server Metrics agent so that it was compatible with the Nagios plugin system, allowing users to use any Nagios plugins to collect performance data during their load tests.

As a result, Server Metrics allows you to collect performance metrics from almost anything! Measurements from the Server Metrics Agent can be correlated with other measurements collected during load tests, and results are made available as a time series that can also be viewed in graph format on the test results page, or exported to CSV (comma-separated values) format for use in a spreadsheet.

The Nagios community has created over 3,000 different plugins that measure the health of all kinds of software applications, hardware products, networks and services. And the plugins are available for all kinds of platforms (e.g. Linux, Windows, etc).

  1. Follow the instructions at https://loadimpact.com/server-metrics-agent-download to download, install and enable your server metrics agent

  1. Go to http://exchange.nagios.org/directory/Plugins and find the plugin(s) you want to use. In our case we wanted to monitor PostgreSQL so we go to http://exchange.nagios.org/directory/Plugins/Databases/PostgresQL which lists 18 (!) different plugins that can extract information about the health of a PostgreSQL server. We chose the “check_postgres” plugin – http://exchange.nagios.org/directory/Plugins/Databases/PostgresQL/check_postgres/details

  1. Download and install the check_postgres plugin (in our case we did it locally on our PostgreSQL server)

  1. Edit the configuration file for the server metrics agent – it is called “li_metrics_agent.conf” and look at the section in it that says “# An external script” for information about how to make the Server Metrics agent start using your new Nagios PostgreSQL plugin. In our case we added two lines that looked like this:

[db_connections]

command = /usr/bin/perl /path/to/check_postgres-2.11.1/check_postgres.pl –host=localhost –port=5432 –dbname=loadimpact –dbuser=postgres –dbpass=verysecret –action backends -w 5 -c 10

Tip: if you have installed a Nagios plugin but don’t know what parameters it needs, try executing it with the –help parameter

  1. Restart your Server Metrics agent

  1. As usual, you then enable Server Metrics data collection from this particular agent when you configure a load test

Tip: the agent name should be shown as a selectable Server Metrics agent in the test configuration interface. If it you do not see it listed, this means your agent hasn’t started or that it can’t reach loadimpact.com. The latter is often a firewall issue.
When the test starts, you will see the Server Metrics agent coming online in the console:

Then when the load test is running you will be able to plot the usual CPU, memory, disk, etc. statistics that the Server Metrics agent collects by default, but you will also have a new metric called whichever name the active database has that you are measuring client connections for (in this case, the database is called “loadimpact”):

So in this example, we choose to plot this metric, which will show us the current number of clients connected to the database “loadimpact” on the PostgreSQL database on the physical server “dbserver1”. The chart then looks like this:

The orange line shows the current number of connections to the database “loadimpact”, which in this example is around 80 and fairly stable.

This is, of course, just a simple example. The check_postgres plugin can measure a vast number of things related to your PostgreSQL database server. And anything it can measure you can have the Load Impact Server Metrics agent collect and relay to loadimpact.com to be stored as test result data associated with your load test. Many of the 3,000+ Nagios plugins are very powerful data collection programs, and by utilizing the Nagios plugin compatibility of Load Impact’s Server Metrics agent you suddenly have access to an incredibly wide range of measurement and monitoring options for load testing.

Cloud Based Server-Side Load Testing

Just recently we announced the release of our Server Metrics agent. A feature that makes it possible to gather internal data from your server.

To get started with Server Metrics, please check out this tutorial that will guide you through the installation and setup process.

When Load Impact runs a test, the test server will collect a wide array of externally measured data. By measuring the load target from our end, we can quite easily pick up and store data about clients active, response time, transactions per second – just to name a few. We present this data to you in our web UI online, exported in CSV file or via our API for further analysis.

But there are a lot of other measurements that most of our users need to have to be able to do a better analysis of the performance and that is exactly what Load Impact Server Metrics tries to solve.

Fig 1. Memory and CPU usage of the target system

By installing the Server Metrics Agent on one or more target systems, our load testing server can pick up some internal measurements during the test and add those to the same data set. Load Impact  supports collecting data from several different target machines during a test, so it’s possible to get internal measurements from a fairly complex setup as well. The advantage of this is quite obvious.

Even if it would be possible to log this data separately on the target machines, you would end up with the task of trying to synchronize the time stamps of the internally generated data series with the data from Load Impact. Even if that’s of course possible to do, it’s going to be a bit of a hassle that you can easily avoid.

Technically, the Server Metric Agent software is a Python based script that will run as a service/daemon  on your target systems. It will require Python 2.6 and a fairly common library called psutil. Both Python 2.6 and psutil are open source and will run on pretty much every operating system we know of.  We offer installers for 32 and 64 bit Debian based Linux distributions, including Ubuntu, as well as for 64-bit Windows Server 2008 R2 and 2012.  For other systems, we offer the Python source code for download. Also note that in order to connect a Server Metric Agent installation to your, and only your, Load Impact account, you are required to generate a Server Metric token on your account settings page.

About Load Impact

Load Impact is the leading cloud-based load testing software trusted by over 123,000 website, mobile app and API developers worldwide.

Companies like JWT, NASDAQ, The European Space Agency and ServiceNow have used Load Impact to detect, predict, and analyze performance problems.
 
Load Impact requires no download or installation, is completely free to try, and users can start a test with just one click.
 
Test your website, app or API at loadimpact.com

Enter your email address to follow this blog and receive notifications of new posts by email.