Load Impact 2.3 released!

We’re happy to introduce Load Impact 2.3!

Load Impact 2.3 contains a new and improved proxy recorder that automatically detects pages and creates page load time result metrics for each of your web pages. The recorder also allows you to insert code comments in the generated user scenario, which can be useful in order to find where in your user scenario code a certain page is being loaded.

Behind the scenes, Load Impact 2.3 also includes a lot of optimizations that result in a much faster reporting interface, especially for large tests that generate a lot of results data these optimizations will make a huge difference to how snappy the “view test” page feels. And for live tests, the reporting page will also be a lot smoother. In fact, Load Impact 2.3 is a major rewrite of the underlying storage subsystem and how data is being accessed by the user interface code. More things are loaded on-demand now (i.e. as/when needed) and this results in a page that is much lighter on the client computer. You should now be able to view even the largest tests on the flimsiest of laptops.

Other improvements you will find in 2.3 include:

 

  • Graphical editor support for data stores, custom metrics and other new API functionality
  • Several API updates – http.page API functions, named parameters, etc.
  • You can now plot graphs of load generator CPU and memory usage during the test!
  • The URL list on the report page now displays bytes received and compression ratio
  • Content type classification now uses the Content-Type header
  • Click the pie charts to highlight different objects in the URL list on the test report page
  • Many bug fixes…

 

Big data

Big data – what is that?

You might have heard the term, but its actual meaning depends a bit on who you are. Big data is essentially a name for data sets that are much larger than they used to be. It is a term that describes how modern applications often generate enormous amounts of data, as compared to their fairly recent predecessors. It can be the Large Hadron Collider, a weather satellite or a popular social networking site but they all tend to generate huge amounts of data. Data that needs to be stored somewhere.

The rapid development of harddrive technology and network communications has enabled data sets to also grow very rapidly. This means that transferring the bits and bytes, and storing them on some kind of physical media is not a big problem. What is a bit of a problem, on the other hand, is the software that is used to store and retrieve data. Like SQL database applications. These were excellent when all you wanted to do was store the name and address of your company’s 1,000 customers in an organized manner, so that you can then perform searches for all customers located in a certain city, etc. However, today there are companies like Facebook that has to store huge amounts of data – data that they want fast access to as they need to quickly dig up those 14 pictures that are yours from a collection of over 40 billion uploaded pictures. This was not what the old SQL databases were designed for, and in general, many of the old software tools and applications for handling data simply are not up to working with these new and huge data sets.

Enter technologies such as NoSQL-databases – Cassandra, Voldemort, CouchDB, MongoDB, Neo4J, Dynamo, Redis, Memcached etc, etc. There are now a wide range of different systems that can store large amounts of data in an efficient manner. You always get some kind of trade-off, of course, and the end result is always a more complex design of your application, but it allows you to scale the size of your data sets to previously unimaginable levels. The development of these systems is progressing very rapidly, and data sets are growing at the same furious pace as a result. Load Impact uses Apache Cassandra to store load test results, providing us with a flexible way to scale our system as the number of users and the amount of test data we store increases. Currently, our test result database is growing at a rate of several gigabytes per day.

Cloud computing is another enabler of big data. While previously it was difficult to scale your infrastructure to be able to handle large data sets without making huge upfront investments in your infrastructure, today you can rent the infrastructure as, and when, you need it. An application that quickly has to perform a complex calculation on a large set of data, but do it only occasionally, would previously have been too costly to run because of the infrastructure cost. Today you can rent a thousand Amazon EC2 servers for an hour and pay only a couple of tens of dollars to do so.

For us, big data is a positive development as it increases the demand for large-scale load testing. Online services get larger and more resource-intensive, and there is money to be saved on optimizing your solution to use the least resources possible. Even more important, in many cases, is optimizing for speed so that people (or machines!) using your online application, site or service, will not choose a competitor over you. Speed is becoming a critical competitive advantage, and load testing is an important test method for those who want to ensure that their site or application is fast under all circumstances. Traditional load testing solutions are often unable to scale up the load levels to what is required to properly stress a large site or application out there, so we see a clear trend that people are becoming more and more interested in cloud-based, online load testing.

If you want to know more, a good start is the Wikipedia article on big data:

http://en.wikipedia.org/wiki/Big_data

 

 

 

Parameterized data, and more

We, are happy to introduce two new, major features in Load Impact that many users have asked for: parameterized data (“data stores”) andcustom metrics.

Parameterized data is when you’re able to provide data in bulk using some common format – often a CSV (comma-separated) file that you upload, and which you can then access from your load script. The typical example is when you have e.g. 10,000 login names and passwords that you want to use in your load test. Entering these by hand into your load script code is prohibitively time-consuming, but with parameterized data you just upload a text file with all the usernames and passwords, and are then able to access these from inside your load script.

Custom metrics is a feature that allows you to store arbitrary result metrics during a load test. A typical use-case would be to store the load time for a certain page on your site (as opposed to just storing the load time for individual URLs/resources on the page). A more advanced use-case would be to fetch server monitoring data (via HTTP) from the web servers that are being tested, and log e.g. their CPU load along with the standard response time data collected by Load Impact. Any metrics stored with our custom metric feature will be visible in the test results interface, and can be plotted as graphs for easy correlation with the standard metrics.

Parameterized data in Load Impact

Parameterized data in Load Impact is implemented using something we call data stores. A data store is basically a two-dimensional array (a “table”) with data that can be shared by multiple clients in a load test. The usage is simple: You create a new data store in the user scenario configuration interface, assign a name to it, then upload a text file with the data you want to insert into it. The data file should be in CSV format (comma-separated values) and be a two-dimensional table, but can contain any number of rows and columns. When you have a data store assigned to a user scenario your load test clients can then use the data store API functions to access the data store, and retrieve data from it.

Further reading: FAQ: How do I use parameterized data?

Custom metrics

Custom metrics allows you to create your own, arbitrary result metrics and store sample points for them that you can then plot in graphs just like any other measurements. Custom metrics are really simple to use – in your load script you just have to call the special functionresult.custom_metric() and supply it with one parameter defining the name of the metric – e.g. “page 1 load time” – and one parameter defining the current measurement value for the same metric (a numeric value). Custom metrics can be used to plot all sorts of interesting measurement data, such as page load times, bandwidth usage for a single URL/resource, time to first byte for new TCP connections, and a multitude of things.

After 2.0 – what is next?

Post-2.0 updates and plans for 2012

We released Load Impact 2.0 at the end of October, and the reception has been really, really good. We see an increased number of user registrations and more user activity and what is especially fun is to see that people are starting to use Load Impact for really complex load testing of a wide range of different platforms and applications. It seems all the new functionality of Load Impact 2.0 has been very well received and that people are starting to realize its potential, which is great to see for us who have worked so hard in 2011 on getting it out the door.

Right after release we had a number of issues with the payment system, as well as numerous small bugs that only manifested themselves in production, but overall it was a successful release without any major malfunctions. There are still small annoyances left to iron out, but we are making the service better by the day, and also adding new functionality. Here is a list of some things we have done post-release:

 

  • Support for new load zone in south america – Sao Paolo, Brazil – and new US West load zone – Portland, Oregon.
  • New chart/graph component implemented, providing even more advanced graphing capabilities (e.g. instant zoom)
  • Improved help/documentation – customer case studies, load scripting tutorial and example scripts
  • Data export functionality added (export to CSV)
  • Several problems related to the payment system have been fixed. AMEX support was added.
  • Several problems related to test startup have been fixed.
  • Credit refund logic for aborted or failed tests has been improved. You now get partial or full refunds when a test fails, for some reason.
  • Anonymous tests are run from random load zones.
  • Numerous small UI bugfixes/improvements.
  • HTTP basic Auth now supported for automatic load script generation.
This list is by no means exhaustive. We update the service every week, usually, with many minor fixes and improvements, sometimes adding new features also. For 2012 we have some much-asked-for features on the road map, such as:
  • Improved data parameterization support
    We will implement “data stores” that allow people to upload large sets of data, which will then be made available to them in load scripts. This functionality will make it a lot simpler for people who e.g. have a large list with usernames and passwords that they want the simulated clients in a load test to make use of.
  • User-defined metrics
    You will be able to create your own reporting metrics and have your load script store results values for those metrics during a test. Then you can plot graphs for these metrics along with the standard metrics in the reporting (test result) interface. An obvious use for this functionality can be to report load times for individual web pages, in case a user scenario accesses multiple pages (which is fairly common).
  • Server metrics
    This is also a power-user type of feature that allows you to import performance data from the web server(s) you are testing and plot graphs of e.g. the CPU usage on your web frontend machine, overlaid with a graph of the average response time for an HTTP transaction. Being able to import server metrics from the machines that are being stressed in the load test provides a much simpler way of correlating information in order to find out where performnce bottlenecks are. Of course, we will support importing data from database servers and other systems your site/application might be dependent on also.
If you have any other features you think we should rather be focusing on, don’t hesitate to tell us about it!  We love feedback.
A merry christmas and a happy new year to you all!

 

2.0 Highlights

Load Impact 2.0 was released at the end of October (27th). The first few days after release were pretty chaotic, with lots of minor issues and some major ones, but having been involved in many big releases during my career I have to say that this one went pretty well actually. The system was up and functional most of the time, the first few days post-release, and that isn’t bad at all 🙂

Still, there were some difficulties, of course. We had problems first with AMEX payments due to contractual reasons (AMEX payments have been removed for now, until we manage to get through the AMEX bureaucracy) and then with VISA/MC payments. Then there was occasional problems with internal queueing systems that caused some load tests to either fail, “freeze” (get stuck in some state), or never get started. All these issues should have been resolved by now, but there are likely smaller things that will pop up from time to time, so we urge everyone to get in touch with us if you see anything strange happening on the site. Don’t hesitate to get in touch with us even if you’re unsure whether something is a problem on our side or not, we want to know about all situations where someone has any kind of problem using our service. No issue is too small.

In general, the system is starting to get very stable now, however, and we see more activity than before the release, with more user registrations and more tests being executed. We also see more advanced usage of our service – more people are writing advanced load scripts and running both larger and more complex load tests than ever before. It is all very encouraging and tells us that we are moving in the right direction!

So what is so great about 2.0 then?

Some people may see Load Impact 2.0 as simply an upgrade, but it’s more like the launch of a whole new service. It is that much different from 1.0. We have kept some 1.0 key elements that we (and hopefully everyone else) liked such as the ability to run small, simple tests anonymously from our front page, the ability to watch other such anonymous tests that are being run, and the scripting language and scripting API, but behind the scenes most of the code base is new and 2.0 includes a lot of new functionality that didn’t exist in 1.0. Here is a small list:

  • Large-scale load tests
    As we are now using public cloud infrastructure (Amazon) to generate load test traffic, we have the ability to scale up a load test to a very large size at any of the geographic locations where there are cloud servers available (currently California, Oregon and Virginia in the US, plus Ireland, Japan and Singapore outside the US).
  • Multiple user scenarios in a single test
    In 2.0 we introduce “user scenarios”. A user scenario defines a certain simulated user category and what that category should be doing on your site. An example can be an e-shopping site that has two types of visitors – one type that just browses the site without buying anything, and another type that registers a user account on the site and then goes on to actually buy products on the site. In Load Impact 1.0 you could not easily combine these two different user categories in a single load test, but with Load Impact 2.0 it is easy – you just create two different user scenarios, that run different load scripts, then you configure your load test to use these two scenarios.
  • Multiple geographical traffic sources
    With Load Impact 2.0 you can now choose to have your traffic originate from more than one physical place, if you want. You can specify any number of combinations of user scenarios (described above) and geographical locations where that particular user scenario should be executed, and create very complex load test configurations where you define that e.g. 10% of the total number of simulated users during the load test should run user scenario X from geographical location Y.
  • More performance metrics
    We now collect more performance metrics than in 1.0, such as “requests per second”, and we collect many more sampling points that are all time-based rather than client level-based. This results in more performance data available at higher resolutions than before.
  • Much more advanced chart/graph capabilities
    We provide a very dynamic test report page where you can create your own charts and graphs, plotting a wide range of parameters and correlate data with a certain user scenario or test results from a certain geographical region.
  • Text-based script editor

    For expert users, a text-based scripting editor is usually the best choice, and in Load Impact 2.0 we provide the option to choose between our graphical script editor (LILE) and a text editor that allows easy copy-and-paste and faster code entry for the experienced programmer. Load script programmers now have much more choice in how they create their load scripts.
  • Continuous tests
    Load tests are now executed continuously, which means that a simulated client thread is never shut down as long as the load level is meant to increase. Old simulated clients will just continue execution, reiterating their load script again and again, while more clients are being added. The result is a smoother and more time-efficient ramp-up process than was offered in Load Impact 1.0.
  • Credit based pricing model
    Load Impact 2.0 introuces the credit based model that means there is no difference between one user and the next as regarding them being a “premium” user or not. All users are the same, they just have different amounts of credits, and the ones that have more credits can run larger and longer tests than those who have few credits. This provides several advantages – first of all it allows us to skip all the old limits on how many tests you can run per 24 hours, etc. Now, every test you run consumes credits and only the number of credits you have affects the number of tests you can run. Secondly it means we don’t have to restrict access to some functionality to premium users – everyone can do everything on the system, so it is easy to “try before you buy”. Thirdly, it makes our product much simpler in general as we only sell one single thing now – Credits – while as earlier we sold access to different premium levels for different amounts of time, making everything a lot more complex. The drawback, however, is that it can be difficult for people to understand exactly how many credits they need to do the testing they want to do. All in all, though, we think the upsides with the credit model are much bigger than the downsides.
You can watch a video introduction to Load Impact 2.0 on Youtube: http://www.youtube.com/watch?v=CkGuBONAXLE
There are many exciting new features on our road map for the end of the year, and for 2012, and we really appreciate your feedback on exactly what things you would like to see in future versions of Load Impact. If there is something you think is missing that would really make a difference to you, please tell us about it!
We will continue to work hard on making Load Impact the best load testing solution in the world. We are slowly becoming the de-facto standard for online load testing, and it’s all thanks to you, our users, so we would like to extend a big thank you for your support ever since we launched back in 2009!  So keep load testing and don’t forget to try out all our new features!
  /Ragnar & the Load Impact team

Load Impact 2.0!

We’re excited to announce Load Impact 2.0 !

Early spring 2011, we were sitting on a ton of ideas about how to improve Load Impact. We had lots of things on our TODO list for the next few major releases of the service, and were discussing what to focus on first and what our general development road map should look like for the rest of 2011.

We came to the conclusion that incremental updates, that we had been doing so far, was not the best course of action. Some of the changes we wanted to make to the service were dependent on other changes we also wanted to make, and some were hard to achieve on top of the current legacy system. Some parts of old Load Impact we had long been wanting to remake from the ground up, and we realized that this was the time to do it. To break with the old codebase and start a new one, transferring everything we liked from the old code base but not hesitating to throw out anything we did not like.

So we embarked on that long and hard, but also fun, journey. Initially, we aimed to continue updating the old platform regularly, rolling out new features and updates to the live site while developing Load Impact 2.0 in parallel. We soon realized that this was overly ambitious, however, and decided that advanced scripting and the menu-based scripting editor that we released in April would be the last major update to the old Load Impact code base.

Then we spent most of the summer and autumn frantically developing Load Impact 2.0. Since August we have been in crunch mode, working 10-hour days, 6 days a week (which is quite a lot to us lazy and decadent Europeans) and our efforts are starting to pay off now, with the 2.0 platform getting closer and closer to being release ready. At the time of writing we are running a closed beta test, and we expect that to continue for another week or two, then we will take 1-2 weeks to finish off everything, and finally release in the second half of October.

So, what’s in it for me?  How will Load Impact 2.0 affect me?

First of all, Load Impact 2.0 is a huge upgrade from the old system. We don’t want to spoil the surprise, but it will mean a big step up functionality-wise. We expect our competitors to tear their hair out when they see it, at the very least. Introducing a lot of new features often means that you also introduce complexity, but we think we have made a pretty good job of hiding complex functionality until the user asks for it. Load Impact 2.0 should be as easy to use as (or easier than) the current system.

 

Introducing Load Impact credits

One big change that we want to announce beforehand, however, is the new pricing model we will adopt in 2.0. So far, we have been selling subscriptions to premium users, letting them buy premium access for a certain amount of time (a day, a week or a month) but we have realized there are several drawbacks to this scheme. For example, people cannot try out all the Load Impact features until they buy a premium subscription. How do they know that they will be able to do what they want to do, if they can’t try before they buy? Also, we have to have limits in place on how many tests you can run, how much data you can transfer etc during your subscription period, otherwise we could be hard hit if someone bought e.g. premium access for a month and then ran one test after another, continuously throughout the whole month. So we set limits, and when a user runs one test too many they are told they can’t run any more tests. Many people miss these limits, and are upset when they suddenly get denied trying to start a test.

To avoid these problems, and to get a simpler premium product, we have decided to scrap the old time-based subscriptions and instead sell Load Impact Credits. The credits are used whenever you run a load test, with a small test costing less than a large test. Just by having a registered account you will automatically receive a small amount of credits for free every month. You can use these credits to run several smaller load tests, or perhaps one medium-sized test. Per month. If your needs are more frequent or you need to run larger tests, you have to buy extra credits.

We think this system is fair and that it will allow all our amateur load testing users to continue running really small-scale load tests for free, with access to all our functionality, while the professional testers will have to pay for their testing as they often need to run more large-scale tests and sometimes more frequently also.

 

What will happen with the old system?  Will I be able to access my old test results?

When Load Impact 2.0 is released, we will transfer all users from the old system to the new. We will then also migrate all old test results, configurations etc. The new system will be backwards compatible with the old so you will not lose any data. In fact, there are some test result metrics that we collect today, but which you are not able to see in the user interface (such as how many transactions returned error codes). These metrics will be available in 2.0, even for your old test results.

As Load Impact 2.0 will contain all the functionality (and more) of the current system, we have no plans on keeping the old system running in parallel with the new. When we release, you will not be able to logon to the old system anymore. The web address will still be the same as always – http://loadimpact.com – but the look-and-feel, and the functionality will be different.

 

What if I have an active subscription at the time you upgrade the site – what happens to my subscription?

Existing subscribers will be given a generous supply of credits, so they will not feel they lost anything by buying a premium account just before the upgrade.

 

When is the exact date of the release?

We have to get back to you on that!  When the exact date is set, we will email all our users about it.

 

If you have any more thoughts or questions, don’t hesitate to contact us

WordPress load testing part 3 – Multi language woes

Understanding the effects of memory starvation.

This is the third part in a series of posts about WordPress and performance. In part 1,
we took a look at WordPress in general. In part 2 and part 2.5 we reviewed a couple of popular caching plugins that can boost performance. In this part, we’ll start looking at how various plugins can have a negative effect on performance and if anything can be done about it.

In the comments for one of the previous posts in this series, Yaakov Albietz asked us if we used our own service www.loadimpact.com for the tests. I realize that I haven’t been that obvious about that, but yes, absolutely, we’re exlusively using our own service. The cool thing is that so can you! If you’re curious about how your own web site handles load, take it for a spin using our service. It’s free.

We started out by looking for plugins that could have a negative effect on WordPress performance, thinking, what are the typical properties of a bad performer plugin? Not so obvious as one could think. We installed, tested and tinkered with plenty of suspects without finding anything really interesting to report on. But as it happens, a friend of a friend had just installed the WordPress Multi Language plugin and noted some performance issues. Worth taking a look at.

The plugin in question is WordPress Multi Language (WPML). It’s got a high rating among the WordPress community wich makes it even more interesting to have look at. Said and done, we installed WPML and had it for a spin.

The installation is really straight forward. As long as your file permissions are set up correctly and the WordPress database user have permissions to create tables, it’s a 5-6 click process. Install, activate, select default language and at least one additional language and your done. We’re eager to test, so as soon as we had the software in place, we did our first test run on our 10 post WordPress test blog. Here’s the graph:

Average load times 10 to 50 users

Ops! The baseline tests we did for this WordPress installation gave a 1220 ms response time when using 50 concurrent users. We’re looking at something completely different here. At 40 concurrent users we’re getting 2120 ms and at 50 users we’re all the way up to 5.6 seconds or 5600 ms. That needs to be examined a bit more.

Our first suspicion was that WPML would put additional load on the MySQL server. Our analysis was actually quite simple. For each page that needs to be rendered, WordPress now have to check if any of the posts or pages that appears on that page have a translated version for the selected language. WPML handles that magic by hooking into the main WordPress loop. The hook rewrites the MySQL query about to be sent to the database so that instead of a simple “select foo from bar” statement (over simplified), it’s a more complex JOIN that would typically require more work from the database engine. A prime performance degradation suspect unless it’s carefully written and matched with sensible indexes.

So we reran the test. While that test was running we sat down and had a look at the server to see if we could easily spot the problem. In this case, looking at the server means log in via ssh and run the top command (if it had been a Microsoft Windows box, we’d probably have used the Sysinternals Process Exporer utility) to see what’s there. Typically, we’d want to know if the server is out of CPU power, RAM memory or some combination. We were expecting to see the mysqld process consume lots of CPU and verify our thesis above. By just keeping an unscientific eye on top and writing down the rough numbers while the test was running, we saw a very clear trend but it was not related to heavy mysqld CPU usage:

20 users: 65-75% idle CPU 640 MB free RAM
30 users: 50-55% idle CPU 430 MB free RAM
40 users: 45-50% idle CPU 210 MB free RAM
50 users: 0%   idle CPU  32 MB free RAM

As more and more users was added we saw CPU resource usage go up and free memory availability go down, as one would expect. The interesting things is that at 50 users we noted that memory was extremely scarce and that the CPU had no idle time at all. Memory consumption increases in a linear fashion, but CPU usage suddenly peaks. That sudden peak in CPU usage was due to swapping. When the server comes to the point where RAM is running low, it’s going to do a lot more swapping to disk and that takes time and eats CPU. With this background information in place, we just had to see what happended when going beyond 50 users:

That’s very consistent with what we could have expected. Around 50 concurrent users, the server is out of memory and there’s a lot of swapping going on. Increasing the load above 50 users will make the situation even worse. Looking at top during the later stages of this test confirms the picture. The kswapd process is using 66% percent of the server CPU resources and there’s a steady queue of apache2 processes waiting to get their share. And let’s also notice that mysqld is nowhere to be seen (yes, this image is only showing the first 8 processes, you just have to take my word for it).

 

The results from this series of tests are not WPML specific but universal. As we put more and more stress on the web server, both memory and CPU consumption will rise. At some point we will reach the limit of what the server can handle and something got to give. When it does, any linear behavior we may have observed will most likely change into something completely different.

There isn’t anything wrong with WPML, quite the opposite. It’s a great tool for anyone that want a multi language website managed by one of the easiest content management systems out there. But it adds functionality to WordPress and in order to do so, it uses more server resources. It seems WPML is heavier on memory than on CPU, so we ran out of memory first. It’s also interesting to see that WPML is actually quite friendly to the database, at no point during our tests did we see MySQL consume noticeable amounts of CPU.

 

Conclusion 1: If you’re interested in using WPML on your site. Make sure you have enough server RAM. Experience of memory requirements from “plain” WordPress will not apply. From the top screen shot above, we conclude that one apache2 instance running WordPress + WPML will consume roughly 17 Mb RAM, we havent examined how that differs with number of posts, number of comments etc, so lets use 20Mb as an estimate. If your server is set up to handle 50 such processes at the same time, you’re looking at 1000 Mb just for Apache. So bring out your calculators and calculate how much memory your will need for your server by multiplying the peak number of users you expect with 20.

Conclusion 2: This blog post turned out a little different that we first expected and instead of blaming on poor database design we ended up realizing that we were watching a classic case of memory starvation. As it turned out, we also showed how we could use our load testing service to provide a reliable source of traffic volume to create an environment where we could watch the problem as it happens. Good stuff, something that we will appear as a separate blog post shortly.

 

Feedback

We want to know what you think. Are there any other specific plugins that you want to see tested? Should we focus on tests with more users, more posts in the blog, more comments? Please comment on this post and tell us what you think.

 

WordPress load test part 2 – amendment

This is the second and a half part in a series of posts about WordPress and performance. In part 1, we took a look at WordPress in general. In part 2 we reviewed a couple of popular caching plugins that can boost performance. In this follow up post, we’ll tell you a bit about what we learned after part 2 was published.

 

Revisiting W3 Total Cache

After publishing our results in part 2, we received a concerned Twitter message from Frederick Townes, the W3 Total Cache (W3TC) author. He thought we had done something wrong since the Disk enhanced cache mechanism used in W3TC should be at least as effective as the WordPress Super Cache plugin (WPSC). After a brief Twitter discussion, we understood that he was absolutely right. The mod_rewrite magic that WPSC uses to achieve the amazing results was indeed present in W3TC as well (and I might as well add that the mod_rewrite rules added by Freredick’s plugin is neater than the ones added by the test winner).

The mistake we made in our first test is that we didn’t realize the significant difference between the two different disk based page caching methods available. There’s “Basic” caching which is the one we tested, and there’s “Enhanced mode”. In Basic mode, W3TC will work pretty much the same way as the standard wp-cache plugin which involves invoking a PHP script. In our server benchmark, we’ve already seen that our server will consume in the region of 80ms for doing that so we’re glad if we could avoid it in the elegant manner that WordPress Super Cache does.

In Enhanced mode, surprise surprise, avoiding invoking PHP is exactly what W3 Total Cache will do. The basic concept is the same in W3TC and WPSC, both plugins will add a bunch of lines to the .htaccess file that tells Apache2 to go look for a static copy of (a.k.a cached) version of the requested file/resource. And as noted above, W3TC does this with a slightly more elegant addition to .htaccess. In our defense, even though the documentation provided with W3TC is good, we didn’t find anything in particular that explained this significant difference between Basic and Enhanced.

How Load Impact can affect the results

Naturally, we took W3TC back to the lab to see how fast it is in enhanced mode. But before telling you about the results, we want to explain a few details about how Load Impact works. When we ask Load Impact to simulate the load of 50 concurrent web users, that is exactly what Load Impact will do. The second the test starts, exactly 50 virtual users will load the page at the same time and Load Impact will note how long time it takes before the web server responds and the content is completely transferred. Then, each virtual user will make a random pause and try again. Depending on the accuracy settings for the test, this will be repeated over and over again. In a “Fast” test, there will be very few repetitions and in a “Accurate” test, there will be lots and lots of repetitions. The more repetitions, the more data points to use when calculating the average load time. This configuration setting allows users to prioritize test completion time over accuracy if they want to. This behavior actually have some impact when testing cache plugins. When we test, the first time when 50 virtual users comes storming to the test web server at once Apache will fire up as many child processes as it’s configured for, 30 in our case. All of these processes will go look in the cache and quite likely discover that there is no cached version of the requested file. So PHP is invoked, WordPress will generate the page and the cache plugin kicks in and stores the rendered version of the page in the cache. Not only does creating the cached version take more time than a normal page request does, in our scenario, there’s a risk that this is done up to 30 times. And to make things even worse, 30 child processes writing to a file based cache exactly the same time will cause a lot of file locking problems that will end up taking even more time.

The conclusion is that depending on the state of the cache when the test begins, the response time of the first 30 data points may vary. And this is exactly what we saw when we took W3 Total Cache back to the lab.

Testing W3 Total Cache again

We retested W3TC again and arrived at these numbers:

WordPress baseline: 1220 ms

W3 Total Cache (basic disk): 256 ms (-79.0%)

W3 Total Cache (enhanced disk): 188 ms (-84.6%)

That’s a solid improvement so we contacted Frederick again with the good news only to be turned down again, “something is still wrong” he told us. Then we redid the test for Enhanced mode  and over again with minor tweaks to the W3TC settings. After every tweak, we cleared the cache so that any cached pages specifics wouldn’t interfere with the current settings. We saw slightly higher average load times as well as slightly lower, but we were never close to the 112 ms record set when using the WordPress Super Cahce plugin. Until the “warm vs cold” cache issue hit us and we did a test with a warm cache. And boom! The average load time decreased all the way down to 109 ms, better than what WPSC would acheive. So let’s add how W3TC performs using enhanced disk caching:

Using Enhanced disk cache:

Average load time 50 users: 109 ms

Baseline difference: -1111 ms

Baseline difference %: -91.1%

 

 

Summary

Results

Before updating the results table,  we retested the other results as well, but the number we ended up with in the retests was all within a 5ms difference from the original test result, so we’re sticking with the results from our first round of tests. But we’re reducing to using just 2 significant figures:

Plugin Avg. load time Difference Difference %
Standard WordPress 1220 0 0 %
wp-cache 210 -1010 -83 %
batcache 537 -683 -56 %
WP Super Cache 112 -1108 -91 %
W3 Total Cache (disk basic) 256 -964 -79 %
W3 Total Cache (disk enhanced) 109 -1111 -91 %
W3 Total Cache (memcache) 367 -853 -70 %

 
That’s it.

WordPress load test part 2

NOTE: This post was updated after it was first published. Please click here for explanation.

This is the second part in a series of posts about WordPress and performance. In part 1, we took a look at WordPress in general. In this part, we’ll continue the investigation and look at a few specific plugins that can help improve performance.

First things first, in part 1, we used a 8Gb Quad core server for the tests. From now on, we’ve moved to a KVM virtual server. The main purpose of that is that we change the machine configuration when something interesting is discovered. For instance, if we discover a performance problem and suspect RAM memory to be the bottleneck, we can add memory to the machine and rerun the test. The obvious downside is that the baseline established in part 1 isn’t valid anymore. So the first task is to examine how this virtual machine handles load as described in part 1.

The base configuration for the virtual server is 2 CPU cores running at 2.1 GHz with 1024 MB RAM memory. The OS Ubuntu JEOS upgraded to 9.04. Apache2 is at version ___, PHP5 is up to version  . The MySQL server is located on the same machine and is running 5.xxx. WordPress is upgraded to version 2.9.1.

The baselines. A simple PHP script that sends 10 bytes of data back to the user has an average load time of 85 ms when running 80 concurrent users. That’s actually pretty much the same number as we saw on the 8Gb Quad core machine, we had 80.9 ms on that machine.

Next thing we looked at in the first part was the average load time for a basic empty WordPress install. On the Quad core box, we saw an average load time of 357 ms for 80 users. On the virtual machine, not so good. A ramp up test going from 50 to 80 concurrent users shows load times at 691 ms for 50 users and more or less infinite at 60 users. At that load level, the kswapd process was eating away a good 66% of all available CPU, meaning that the server spent most of it’s time swapping pages back and forth between RAM and disk. Even if nothing actually crashed, we aborted the test and concluded that the current config can’t handle more than 50 concurrent users.

For the final baseline test we added 10 posts into the WordPress install and made a new measurement. On our virtual machine, 50 users gave us a load time of 1220 ms, the same load on the Quad core machine gave us 470 ms response times. Clearly, taking away 2 processor cores and slashing the RAM memory to 1/8th affects average load times badly which is not surprising at all. Anyway, we now know that our current test environment is unlikely to handle more than 50 concurrent users and we also know what happens if we add RAM and/or CPU cores.

 

Tweaking WordPress performance

There are numerous of ways to increase wordpress performance and we’ll have a look at how the numbers gets affected in this particular installation. Now, WordPress wouldn’t be WordPress if the most interesting performance tweaks was already packaged as easy to use plugins, so instead of digging deep into the WordPress core, we ended up evaulating a set of interesting plugins, here they are:

wp-cache plugin

The wp-cache plugin have become very popular way to add a chache to WordPress. WordPress used to have a built in object cache, but that got cancelled in WordPress 2.5. So today, the wp-cache plugin is one of the most obvious plugins that come to mind when wanting to tweak WordPress performance (and yes, we’ll look at wp-super-cache as well). The test result with wp-cache is very good. As we’ve seen above, this server will need 85 ms to server the simplest possible PHP script and the wp-cache plugin gets us fairly close to that ideal number.

Average load time 50 users: 210 ms

Baseline difference: -1010 ms

Baseline difference %: -82.9%

 

batcache plugin

Batcache was written to help WordPress.com cope with the massive and prolonged traffic spike on Gizmodo’s live blog during Apple events. Live blogs were famous for failing under the load of traffic. Gizmodo’s live blog stays up because of Batcache. The developers of Batcache actually refer to WP Super Cache themselves as a better alternative, but in some cases with multiple servers and where memcached is available, Batcache may be a better solution. The performance gains with Batcache is actually not up to par with what wp-cache or WP Super Cache delivers, but it’s still a lot better than a standard WordPress install.

Average load time 50 users: 537 ms

Baseline difference: -683 ms

Baseline difference %: -56.0%

 

WP Super Cache plugin

The WP Super cache plugin takes things a few step further compared to the standard wp-cache. Most notably, by using a set of Apache2 mod_rewrite rules, WP Super cache is able to serve most of your WordPress content without ever invoking the PHP engine, instead the content is served at the same speed as it would serve static content such as graphics or javacsript files. Installing this plugin is a little bit more complicated and it requires both mod_headers and mod_expires Apache2 modules to be enabled. But once installed, it really works, just look at the numbers! If using the WP Super Cache plugin works on your server, it’s probably the easiest and most powerful way to boost your WordPress performance numbers. And if it doesn’t work as intended on your server, the good thing is that it reverts back to the functionality provided by the standard wp-cache plugin.

Average load time 50 users: 112 ms

Baseline difference: -1108 ms

Baseline difference %: -90.8%

 

 

W3 Total Cache plugin

The W3 Total Cache plugin is a powerful plugin that takes the best from wp-cache and batcache and adds a few additional features to improve performance. W3 Total cache allows the user to choose between disk and memory based caching (using memcached). It also supports minifying HTML, JS and CSS files as well as the various types of http compression (deflate, gzip etc.). Finally, W3 Total cache supports placing content on a content delivery network (CDN) that can speed up loading of static content even further. W3 Total Cache have a lot of configuration options and we did not take the time to fully investigate them all. We did test the performance difference when using disk based caching and memory based caching and the difference is actually notable. We enabled minifying and compression but we’ve pretty much used everything else ‘out of the box’.

Using disk cache:

Average load time 50 users: 256 ms

Baseline difference: -964 ms

Baseline difference %: -79.0%

 

Using memory cache:

Average load time 50 users: 367 ms

Baseline difference: -853 ms

Baseline difference %: -70.0%

Summary

Results

NOTE: This table was updated after it was first published. Please click here for explanation.

Plugin Avg. load time Difference Difference %
Standard WordPress 1220 0 0 %
wp-cache 210 -1010 -83 %
batcache 537 -683 -56 %
WP Super Cache 112 -1108 -91 %
W3 Total Cache (disk basic) 256 -964 -79 %
W3 Total Cache (disk enhanced) 109 -1111 -91 %
W3 Total Cache (memcache) 367 -853 -70 %

 

Conclusions

The various performance related plugins for WordPress all revolves around caching. The most impressive results was acheived using WP Super Cache and W3 Total Cache. Among the other plugins, the choice is between disk based caching and memcached based caching. Our tests actually show that disk is faster, but that’s something that needs to be explored further. The tests have been done on a blog with very little data in it and Linux uses a fair amount of disk caching that is probably more effective with these particular amounts of data. Whenever WP Super Cache is not possible to use (or simply feels too exotic for you), we suspect that a perfectly tuned W3 Total Cache is the best choice. W3 Total Cache shows the most potential for tuning and we like the overall ‘look-and-feel’ of it. UPDATE: Actually, after retesting W3 Total Cache, we think it may an even better alternative than WP Super cache. The one negative thing we’ve picked up so far is a potential compatibility issue with WordPress Multi User (WPMU), but we have not been able to confirm that.

 

Feedback

We want to know what you think. Are there any other specific plugins that you want to see tested? Should we focus on tests with more users, more posts in the blog, more comments? Please comment on this post and tell us what you think.

 

 

 

Load testing WordPress

This is the first part in a series of posts about WordPress and performance. In part 1, we’ll look at WordPress in general, in later instalments, we’ll look at how performance is affected by popular plugins and tweaks. (click here for part 2)

WordPress is probably the most popular blogging platform out there today powering a countless number of blogs and other web sites. WordPress was first released back in 200x and quickly became a popular tool for bloggers. Part of it’s success is due to the fact that it’s remarkably easy to install and configure. Another big hit with WordPress is the community of users that contribute to the development by creating plugins. There are plugins for just about anything, display Google ads, search engine optimization, statistics, integration with social media just to name a few.

There are also downsides to WordPress but the one that interests us the most is performance. WordPress was once known to have lacklustre performance properties. It especially had big problems handling a lot of concurrent users. Imagine the disapointment from a young and aspiring blogger that writes endless amounts of excellent blogposts without being able to reach out to a bigger crowd. When he finally catches a break and gets that link from news.com, the WordPress powered blog dies under the pressure and before the blog is back up again, that link from news.com is yesterdays news.

But WordPress have gotten better. The default installation is faster out of the box and there are numerous of tips, tricks and guides on how to optimize WordPress performance beyond what should be possible. And of course, there are also plugins that helps WordPress to perform better. Our mission is to find out what the state of WordPress performance is today. Let’s start.

The tools

The tools we brought to the lab to do this series of tests are fairly simple. We have an out of the box WordPress 2.8.6 blog installed on a single server. The server run Ubuntu Linux 9.04 on a Intel Quad Core 2.1 GHz machine with 8Gb RAM memory. The web server is the standard Apache2 that comes with Ubuntu and the database server is MySQL 5.1 located on the same machine. PHP is up to version 5.2.10. And the most important piece we brought was naturally a LoadImpact.com account to run the tests.

 

Establish a base line for the server

There are lot of moving parts in a setup like this. We first want to establish a baseline that tells us the maximum possible throughput on a PHP page in this specific setup. To do that, we created an simple php script that echoes exactly 10 bytes of data back to the browser. By load testing this simple script we get an understanding of how much of the page load time that is spent just on sending requests back and forth over the Internet, how well Apache2 can fire up the PHP processes and how much time PHP needs to initialize itself.

The baseline test and all the other tests we will be running is a ramp up from 50 up to 80 concurrent users. This is what the graph from the test look like:

Base line test. The performance of the server itself

As you can see. The response times actually gets better with more concurrent users (that’s caching), overall it stays at or under 100 ms. So before putting WordPress into the picture, we see response times at just under 100 ms. That’s the best possible response time we could ever achieve with PHP at this load level on this particular server located at this particular IP.

 

Establish a baseline for WordPress

Ok, next step is to see what we get when we install WordPress. A standard WordPress install will first and foremost run a whole lot more code than the previous script. It also connects to the database and looks for blog posts, comments and a lot of meta data such as what categories that are in use etc. So naturally we expect to see longer response times. We placed the same amount of load on the front page of the WordPress installation as we did on empty.php, here’s what an empty WordPress blog looks like:

Performance when WordPress is installed

 

The response times now begins at just over 300 ms at 50 concurrent users and at 80 the response times are just over 350 ms. But that’s not very interesting, we need to add a few posts so that the scripts and database gets some actual work done. Here’s what the graph looks like when 10 posts are added to the blog:

Wordpress performance with 10 posts added

That’s a bit more interesting. We response times now starts at 425 ms, dips down to 364 ms at 60 concurrent users (mysql caching is our best guess here). At 70 and 80 concurrent users, the response times start rising quite sharply to 439 ms and 601 ms respectively. That actually starts looking like it’s a “knee”, the point where performance starts to really degrade and the server risks grinding to a halt.  Let’s see what happens if we add even more load:

Wordpress load test with more clients

Yes indeed. With more clients, the response times increases even more, as expected.

In absolute numbers, we’re still talking about very good response times here even if this test is using a server with more CPU and RAM memory than the typical WordPress installation have exclusive access to. And we are also looking at fairly high load levels. Getting 150 concurrent users on your blog is not going to happen to a lot of people and maintaining a reponse time of well under 2s is not bad at all.

The second thing to notice is that what we first suspected was a “knee” in the response time chart between 60 and 70 users does not look like a knee at all any more. The response times increases more or less proportionally to the load which is quite good. A really really high performing web site out there would display a more or less flat line for this load levels, but our setup is no where near that k

 

Conclusion

We’ve established a base line for WordPress performance. We’re going to keep testing this setup with various types of tweaks and write about it. The next part of this series will look at different plugins and how they affect performance, we’ve already tested a few of the most popular ones and some of them do affect performance quite a bit.

Feedback

We want to know what you think. Are there any other specific plugins that you want to see tested? Should we focus on tests with more users, more posts in the blog, more comments? Please comment on this post and tell us.

 

(click here for part 2)

 

About Load Impact

Load Impact is the leading cloud-based load testing software trusted by over 123,000 website, mobile app and API developers worldwide.

Companies like JWT, NASDAQ, The European Space Agency and ServiceNow have used Load Impact to detect, predict, and analyze performance problems.
 
Load Impact requires no download or installation, is completely free to try, and users can start a test with just one click.
 
Test your website, app or API at loadimpact.com

Enter your email address to follow this blog and receive notifications of new posts by email.