17Jul / 2014

#UnsexyTech and trying to make it a little sexier

We think what we do is pretty cool. I mean come on! Performance and load testing, who doesn’t get excited at the idea?!

Well, apparently not everyone. Some have even said performance testing is a bit like selling health insurance: most people know it’s important to have, but you don’t reap the benefits of having it until something unexpected happens.

In any event, we wanted to try and find a way to explain what we do in a more relatable and humorous way. Framing our somewhat “unsexy tech” in a way that connects back to everyone’s everyday lives.

Well, here is is. With the help of our video producers Adme (props to a job well done), we made this nifty short video to explain what we do and, if possible, make you chuckle a little. Enjoy!

Are you working with unsexy tech? Let us know why you think your tech is super sexy in the comments below.

Permalink 3 Comments

07Nov / 2013

Load Testing Prior to Holiday Season Rush Can Help Reduce Cart Abandonment Rate by up to 18%

The holiday shopping season is rapidly closing in and e-commerce sites and services all over the world are preparing for one of the busiest times of the year. With expected traffic spikes on November 29th – Black Friday and December 2nd – Cyber Monday.

The pressure to capture every last sale is even greater this year as it is the shortest holiday shopping season in over a decade.

To understand the grandeur of what is at stake if you fail to meet customer performance demands, let’s recap some stats.

According to comScore, approximately 57.3 million Americans visited at least one online retail site on Black Friday 2012 (an 18% increase from the year prior).
On Cyber Monday 2012 alone, sales reached 1.98B and are expected to increase by 17% in 2013.

When it comes to shopping chart abandonment, the stakes get even higher…..

Some US analysts expect to see a shopping cart abandonment rate peak of 88% in 2013 (~6% above the average for the rest of the year).
The total cost of abandoned shopping carts for online retailers has been estimated at more than $18 billion per year.
Even a 1% reduction in cart abandonment can lead to a notable increase in revenue, a $2.6 billion increase in relation to total 2013 U.S. online retail sales forecast.

At this point, you might be asking yourself, but what impact does website performance really have on all this anyway? The answer is, quite a lot actually.

According to Tammy Everts’ blog, one out of five of those carts are abandoned due to slow website performance. Simply put, 18% of shoppers will abandon their cart if pages are too slow. If 18% of that loss can be attributed to slow pages, then this correlates to more than $3 billion in lost sales (across US e-commerce sites) due to poor performance.

Now, while some e-commerce sites are making appropriate preparations for expected visitor load, others are just holding their breath and suffering from ‘the ostrich effect‘ – basically just avoiding to deal with an obviously risky business situation by pretending it does not exist.

Instead of burying their heads in the sand, they should just accept that the risk is very real and extremely probable and start performance testing before it’s too late.

It’s almost embarrassing if they don’t, since cloud-based load testing tools are so accessible and affordable. It was somewhat excusable when you had hardware to install and licenses to buy, but nowadays… seriously?!

In fact, our recent State of Web Readiness report found that while shoppers demand page load speeds in the milliseconds, most e-commerce sites have response times closer to 8 seconds. This could be due to the fact that those same e-commerce site owners surveyed overestimated their website capacity by roughly 3.4 times.

A lot of companies are preparing to meet the upcoming traffic spike and increased activity by taking appropriate measures. Some of those measures are quite easy, we wrote about a few of them a while back in another blog post called “Different types of website performance testing – Part 3: Spike Testing“.

On the up side, you already have some general data about what to expect in terms of traffic spikes. Simply knowing how traffic will trickle in on those key dates will help you to configure more realistic test execution plans.

But make no mistake, if you don’t try out the durability of your site you can’t really be sure that the correlation of all active components of your services – 3^rd parties resources or content, feeds, content management platforms, databases and internal systems – will provide for an acceptable customer experience.

Basically what we’re saying is: don’t pull an ObamaCare, load test before its too late.

Listen to Load Impact CTO and CEO discuss performance testing prior to holiday ramp-up on the Rackspace Google Hangout.

Permalink 1 Comment

21Oct / 2013

Detect server side problems using Nagios plugins and the Load Impact Server Metrics Agent

Just recently we launched our cloud-based Server Metrics Agent – a function that allows you to collect information about what’s happening internally on your servers while your website or -application is being load tested. Installing the Server Metrics agent on one of your machines will immediately let you see how much CPU, memory and network bandwidth the server is using throughout the load test.

This can, of course, be very useful when looking for bottlenecks, but sometimes you want to know more. For example, you might be using a database software such as PostgreSQL and suspect that it is running out of some PostgreSQL– internal resource, such as client connections, causing a bottleneck for your web server in handling client requests. In this case, you will not notice any problems just by looking at, for example, the CPU or memory usage on the physical server where PostgreSQL is running. Instead, you must communicate directly with PostgreSQL and ask it how it’s doing. You want it to tell you how many connections its database clients are using and what the maximum limit is.

When we created our Server Metrics agent, we realized people would want to collect more specialized metrics like this. Not just the standard physical server metrics (e.g. CPU usage, memory usage, disk usage, etc) but we were confronted with a big problem; there are thousands of different systems, platforms, applications from which you might want to collect performance metrics in order to detect bottlenecks, and each of them communicates in different ways. We couldn’t possibly write monitoring code to support every one of them.

Luckily, we have a bit of experience with uptime monitoring, and we knew that the very popular open-source monitoring solution Nagios has a simple and flexible plugin system that is easy to interface with. We came up with the idea of designing our Server Metrics agent so that it was compatible with the Nagios plugin system, allowing users to use any Nagios plugins to collect performance data during their load tests.

As a result, Server Metrics allows you to collect performance metrics from almost anything! Measurements from the Server Metrics Agent can be correlated with other measurements collected during load tests, and results are made available as a time series that can also be viewed in graph format on the test results page, or exported to CSV (comma-separated values) format for use in a spreadsheet.

The Nagios community has created over 3,000 different plugins that measure the health of all kinds of software applications, hardware products, networks and services. And the plugins are available for all kinds of platforms (e.g. Linux, Windows, etc).

Follow the instructions at https://loadimpact.com/server-metrics-agent-download to download, install and enable your server metrics agent

Go to http://exchange.nagios.org/directory/Plugins and find the plugin(s) you want to use. In our case we wanted to monitor PostgreSQL so we go to http://exchange.nagios.org/directory/Plugins/Databases/PostgresQL which lists 18 (!) different plugins that can extract information about the health of a PostgreSQL server. We chose the “check_postgres” plugin – http://exchange.nagios.org/directory/Plugins/Databases/PostgresQL/check_postgres/details

Download and install the check_postgres plugin (in our case we did it locally on our PostgreSQL server)

Edit the configuration file for the server metrics agent – it is called “li_metrics_agent.conf” and look at the section in it that says “# An external script” for information about how to make the Server Metrics agent start using your new Nagios PostgreSQL plugin. In our case we added two lines that looked like this:

[db_connections]

command = /usr/bin/perl /path/to/check_postgres-2.11.1/check_postgres.pl –host=localhost –port=5432 –dbname=loadimpact –dbuser=postgres –dbpass=verysecret –action backends -w 5 -c 10

Tip: if you have installed a Nagios plugin but don’t know what parameters it needs, try executing it with the –help parameter

Restart your Server Metrics agent

As usual, you then enable Server Metrics data collection from this particular agent when you configure a load test

Tip: the agent name should be shown as a selectable Server Metrics agent in the test configuration interface. If it you do not see it listed, this means your agent hasn’t started or that it can’t reach loadimpact.com. The latter is often a firewall issue.
When the test starts, you will see the Server Metrics agent coming online in the console:

Then when the load test is running you will be able to plot the usual CPU, memory, disk, etc. statistics that the Server Metrics agent collects by default, but you will also have a new metric called whichever name the active database has that you are measuring client connections for (in this case, the database is called “loadimpact”):

So in this example, we choose to plot this metric, which will show us the current number of clients connected to the database “loadimpact” on the PostgreSQL database on the physical server “dbserver1”. The chart then looks like this:

The orange line shows the current number of connections to the database “loadimpact”, which in this example is around 80 and fairly stable.

This is, of course, just a simple example. The check_postgres plugin can measure a vast number of things related to your PostgreSQL database server. And anything it can measure you can have the Load Impact Server Metrics agent collect and relay to loadimpact.com to be stored as test result data associated with your load test. Many of the 3,000+ Nagios plugins are very powerful data collection programs, and by utilizing the Nagios plugin compatibility of Load Impact’s Server Metrics agent you suddenly have access to an incredibly wide range of measurement and monitoring options for load testing.

Permalink 5 Comments

07Oct / 2013

Website Owners’ Overestimation of User Capacity by 3.4 Times Kills Profits and Customer Retention

An e-commerce website that grinds to halt simply because there are too many customers attempting to gain access at one time is akin to a supermarket with no parking spaces and isles so narrow that only one shopper can enter at a time, while the rest sit outside waiting to enter and make a purchase.

Few website owners would accept such poor performance and potential loss of revenue, and even fewer consumers would tolerate the waiting time. Most would just move on to another site where service is speedy and meets their expectations.

Thankfully a small, but growing, number of website owners have realized that performance management and capacity monitoring are imperative for delivering even a satisfactory customer experience, let alone an exceptional one.

It’s no coincident that roughly 30% of 500 website owners surveyed for Load Impact’s 2013 State of Web Readiness report which includes data from performance tests of over 5,000 websites, claim to have no stability or performance problems, while about 30% of respondents also said they regularly do preventive load testing before technical changes. Those who foresee the problem take the necessary preventive steps. On the flip side, while nearly 90% of respondents said short response time is either important or very important, 23% of respondents said they don’t monitor the response time on their site at all.

SoWR- Graph-Stability Problems

How can such a gap exist when it’s so obvious that optimum performance leads to higher levels of customer satisfaction, increased conversion and greater revenue?

For the less mature e-commerce sites, current performance problems can be seen as both an opportunity and a threat. While some e-retailers are already far ahead, having scheduled load tests to maintain a sub 2 second response time (77% of all respondents believe response times should be less than 2 seconds), most haven’t even come close to realizing they have a problem. An analysis of over 5,000 load tests revealed that the average website owner overestimates capacity by 3.4 times. In fact, the average page load speed for the e-commerce sites analyzed was closer to 8 seconds – nearly twice the average latency of non e-commerce sites studied.

SoWR- Graph-Response Times

Clearly, big rewards can be reaped by making even small changes to website performance. A 2009 experiment by Shopzilla revealed that decreasing latency by 5 seconds (from 7 seconds to 2 seconds) increased page views by 25% and revenue by 7% to12%. And, according to SEO expert, Jason DeMers, load speed is one of the growing factors in Google’s ranking algorithm.

The Internet giants caught on to the issue of load testing and performance management long ago. Authors and consultants have written books about it, held conferences, and written blogs for years. Google even officially favors fast web sites in its search results, indirectly punishing low performers.

So why are so many e-retailers so slow to catch on about the importance of performance stability?

According to the 2013 State of Web Readiness report, lack of resources is identified as the No.1 reason for failing to monitor and optimize performance levels. However, this is only a part of the problem.

The real explanation has more to do with striking the right balance between functionality, performance and resources, and the fact that, more often than not, optimizing two of the three means sacrificing the third. Therefore, it is often the misallocation of resources that explains a site’s poor performance. Money and time that should have been spent monitoring capacity and load speed instead went to adding additional, often frivolous, functionality.

The lack of sufficient investment in performance management is extremely common. In some ways, buying performance management services is a bit like buying insurance, you understand why you need it, but if all goes as planned you don’t actually get a chance to see the value.

Being forward thinking enough to buy something so intangible is tough.

Other fundamental website issues, such as security, have slowly climbed the ladder of being class A requirements. Even the most technically illiterate now steer clear of any feature that comes with a security concern. From our vantage point, the time has come to give performance and stability management the same time and attention – if for no other reason than it’s simply smart business.

—————————–

Read or share this infographic based on our study’s findings.

Permalink 1 Comment

04Sep / 2013

Different types of website performance testing – Part 3: Spike Testing

This is the third of a series of posts describing the different types of web performance testing. In the first post, we gave an overview of what load testing is about and the different types of load tests available. Our second post gave an introduction to load testing in general, and described what a basic ramp-up schedule would look like.

We now we move on to spike testing. Spike testing is another form of stress testing that helps to determine how a system performs under a rapid increase of workload. This form of load testing helps to see if a system responds well and maintains stability during bursts of concurrent user activity over varying periods of time. This should also help to verify that an application is able to recover between periods of sudden peak activity.

So when should you run a spike test?

The following are some typical example case scenarios where we see users running a spike test, and how your load schedule should be configured in Load Impact to emulate the load.

Advertising campaigns

Advertising campaigns are one of the most common reasons why people run load tests. Why? Well, take a lesson from Coca Cola – With an ad spend of US$3.5 million for a 30 second Superbowl commercial slot (not including customer cost), it probably wasn’t the best impression to leave for customers who flooded to their Facebook app.. and possibly right into Pepsi’s arms. If you’re expecting customers to flood in during the ad campaign, ramping up in 1-3 minutes is probably a good idea. Be sure to hold the load time for at least twice the amount of time it takes users to run through the entire scenario so you get accurate and stable data in the process.

Contests

Some contests require quick response times as part of the challenge issued to users. The problem with this is that you might end up with what is almost akin to a DDOS attack every few minutes. A load schedule comprising of a number of sudden spikes would help to simulate such a situation.

TV screenings/Website launches

If you’re doing a live stream of a very popular TV show (think X Factor), you might want to consider getting a load test done prior to the event. Slow streaming times or a website crash is the fastest way to drive your customers to the next streaming app/online retailer available. Take Click Frenzy as an example – they’re still working to recover their reputation. Streaming servers also tend to be subject to prolonged stress when many users all flock to watch an event or show, so we recommend doing a relatively quick ramp up and ending with a long holding time.

Ticket sales

Remember the 2012 London Olympics? Thousands of frustrated sports fans failed to get tickets to the events they wanted. Not only was it a waste of time for customers, but it also served to be a logistics nightmare for event organizers. Bearing in mind that a number of users would be ‘camping’ out on the website awaiting ticket launch, try doing a two stage quick ramp up followed by a long holding time to simulate this traffic.

TechCrunched… literally!

If you are trying to get yourself featured on TechCrunch (or any similar website that potentially might generate a lot of readership), it’s probably a good idea to load test your site to make sure it can handle the amount of traffic. It wouldn’t do to get so much publicity and then have half of them go away with a bad taste in their mouth! In these cases, traffic tends to come in slightly slower and in more even bouts over longer periods of time. A load schedule like this would probably work better:

Secondary Testing

If your test fails at any one point of time during the initial spike test, one of the things you might want to consider doing is a step test. This would help you to isolate where the breaking points are, which would in turn allow you to identify bottlenecks. This is especially useful after a spike test, which could ramp up too quickly and give you an inaccurate picture of when your test starts to fail.

That being said, not all servers are built to handle huge spikes in activity. Failing to handle a spike test does not mean that these same servers cannot handle that amount of load. Some applications are only required to handle large constant streams of load, and are not expected to face sharp spikes of user activity. It is for this reason that Load Impact automatically increases ramp-up times with user load. These are suggested ramp up timings, but you can of course adjust them to better suit your use case scenario.

Load Impact Blog

Posted by: VanessMeyer