22Apr / 2014

Test Driven Development and CI using JavaScript [Part I]

In this tutorial, we will learn how to apply TDD (Test-Driven Development) using JavaScript code. This is the first part of a set of tutorials that includes TDD and CI (Continuous Integration) using JavaScript as the main language.

Some types of testing

There are several approaches for testing code and each come with their own set of challenges. Emily Bache, author of The Coding Dojo Handbook, writes about them in more detail on her blog – “Coding is like cooking“

1. Test Last: in this approach, you code a solution and subsequently create the test cases.

Problem 1: It’s difficult to create test cases after the code is completed.
Problem 2: If test cases find an issue, it’s difficult to refactor the completed code.

2. Test First: you design test cases and then write the code.

Problem 1: You need a good design and formulating test cases increases the design stage, which takes too much time.
Problem 2: Design issues are caught too late in the coding process, which makes refactoring the code more difficult due to specification changes in the design. This issue also leads to scope creep.

3. Test-Driven: You write test cases parallel to new coding modules. In other words, you add a task for unit tests as your developers are assigned different coding tasks during the project development stage.

TDD approach

TDD focuses on writing code at the same time as you write the tests. You write small modules of code, and then write your tests shortly after.

Patterns to apply to the code:

Avoid direct calls over the network or to the database. Use interfaces or abstract classes instead.
Implement a real class that implements the network or database call and a class which simulates the calls and returns quick values (Fakes and Mocks).
Create a constructor that uses Fakes or Mocks as a parameter in its interface or abstract class.

Patterns to apply to unit tests:

Use the setup function to initialize the testing, which initializes common behavior for the rest of the unit test cases.
Use the TearDown function to release resources after a unit test case has finalized.
Use “assert()” to verify the correct behavior and results of the code during the unit test cases.
Avoid dependency between unit test cases.
Test small pieces of code.

Behavior-Driven Development

Behavior-Driven Development (BDD) is a specialized version of TDD focused on behavioral specifications. Since TDD does not specify how the test cases should be done and what needs to be tested, BDD was created in response to these issues.

Test cases are written based on user stories or scenarios. Stories are established during the design phase. Business analysts, managers and project/product managers gather the design specifications, and then users explain the logical functionality for each control. Specifications also include a design flow so test cases can validate proper flow.

This is an example of the language used to create a BDD test story:

Story: Returns go to stock

In order to keep track of stock

As a store owner

I want to add items back to stock when they’re returned

Scenario 1: Refunded items should be returned to stock

Given a customer previously bought a black sweater from me

And I currently have three black sweaters left in stock

When he returns the sweater for a refund

Then I should have four black sweaters in stock

Scenario 2: Replaced items should be returned to stock

Given that a customer buys a blue garment

And I have two blue garments in stock

And three black garments in stock.

When he returns the garment for a replacement in black,

Then I should have three blue garments in stock

And two black garments in stock

Frameworks to Install

1. Jamine

Jasmine is a set of standalone libraries that allow you to test JavaScript based on BDD. These libraries do not require DOM, which make them perfect to test on the client side and the server side. You can download it from http://github.com/pivotal/jasmine

It is divided into suites, specs and expectations.

.Suites define the unit’s story.

.Specs define the scenarios.

.Expectations define desired behaviors and results.

Jasmine has a set of helper libraries that lets you organize tests.

2. RequreJS

RequireJS is a Javascript library that allows you to organize code into modules, which load dynamically on demand.

By dividing code into modules, you can speed up the load-time for application components and have better organization of your code.

You can download RequireJS from http://www.requirejs.org

Part II of this two part tutorial will discuss Behavioral Driven Testing and Software Testing – how to use BDD to test your JavaScipt code. Don’t miss out, subscribe to our blog below.

————-

This post was written by Miguel Dominguez. Miguel is currently Senior Software Developer at digitallabs AB but also works as a freelance developer. His focus is on mobile application (android) development, web front-end development (javascript, css, html5) and back-end (mvc, .net, java). Follow Miguel’s blog.

19Mar / 2013

Know your node.js

As part of a follow up to last months column about PHP vs Node.js, I hit some problems with Node under load. As with all technologies, Node.js does have some limitations that may or may not be a problem for your specific use case. If the last column about comparing PHP and Node.js had a deeper message, that message would be that if you want to scale you have to know your stack. To be completely clear, when I say stack I mean the layers of technology used to server http requests. One of the most common stacks out there are simply called LAMP – (L)inux (A)pache2 (M)ySQL (P)HP (or Perl). You now see a lot of references to LNMP, where Apache2 is replaced with Nginx. When building Node.js applications, things can vary a lot since node.js comes with it’s own http server. In my previous text, I used Node.js together with MySQL on a Linux box, so I guess we can dub that the LNM stack if we absolutely need to have a name for it. And when I say Know your stack. I mean that if you want to produce better than average performance numbers, you have to be better than average in understanding how the different parts in your stack works together. There are hundreds of little things that most of us never knew mattered that suddenly becomes important when things come under load. As it happens, watching your application work under load is a great way to force yourself to know your stack a little better.

Background

When testing Apache/PHP against Node.js, I found that the raw performance of Node.js as well as the ability to handle many concurrent clients was excellent. Faster and more scalable than Apache2/PHP. One reader pointed out that the test wasn’t very realistic since there was just one single resource being queried and there was no static content involved. Apache2/PHP could very well relatively better if some of the content was static. So I set up a test to check this and while running this. Node.js crashed. As in stopped working. As in would not server any more http reqeusts without manual intervention. So to keep it shord, Apach2/PHP won that round. But in the spirit of ‘know your stack’, we need to understand why Node.js crashed. The error message I got was this:

Unhandled 'error' event "events.js:71"

First of all, it took a fair amout of googling to figure out what that the error message was really about. Or, rather, the error message was saying that something happened and there’s no error handler for it. So good luck.

Fixing it.

The first indication I got via Google and Stack Overflow was that this may be an issue with Node.js before 0.8.22 and sure enough, I was running 0.8.19. So the first thing I did was upgrade to version 0.8.22. But that did not fix the problem at all (but a later and greater version is of course a nice side effect). With almost all other software involved being up to date, this actually required some structured problem solving.

Back to the drawing board

I eventually managed to trace the error message down to a ‘too many open files’ problem which is Interesting as it answers the crucial question: What went wong? This happened at roughly 250 concurrent users with a test that was accessing 6 different static files. This is what it looks like in LoadImpact:

So a little depending on timing, and exactly when each request comes in, it would roughly indicate that some 1500 (6 files times 250 users) files can be open at the same time. Give or take. Most Linux systems are, by default, configured to allow relatively small number of open files, e.g. 1024. The Linux command to check this is ulimit:

$ ulimit -n
1024

1024 is the default on a lot of distros, including Ubuntu 12.10 that I was running the tests on. So my machine had 1024 as the limit but it appears that I had 1500 files open at the same time. Does this make any sense? Well, sort of, there are at least 3 factors involved here that would affect the results:

Load Impact simulates real browsers (Virtual Users). A VU user only opens 4 concurrent connections to the same server even if the script tells it to download 6 resources. The other 2 resources are simply queued.
Each open TCP socket counts as an open file. So each concurrent TCP connection is an open file. Knowing that our limit is 1024, that would indicate that node.js could handle up to 256 concurrent users if each user uses the maximum of 4 open connections.
In our sample, the requests for static resources also opens a file and thereby occupies another file handle. This file is open for less time than the actual connection, but still, under a certain time, a single request can consume 2 open file handles.

So in theory, the limit for concurrent simulated browser users should be 256 or less. But in reality, I saw the number of concurrent users go all the way up to 270 before the Node.js process died on me. The explanation to that is more likely than anything just timing. Not all VU’s will hit the server at exactly the same time. At the end, hitting problems when running about 250 concurrent users reasons well with the open files limit being the problem. Luckily, the limit of number of open files per process is easy to change:

$ ulimit -n 2048

The next test shows real progress. Here’s the graph:

Problem solved (at least within the limits of this test).

Summary

Understanding what you build upon is important. If you choose to rely on node.js, you probably want to be aware of how that increases your dependency on various per process limitations in the operating system in general and max number of open files in particular. You are more affected by these limitations since everything you do takes place inside a single process. And yes. I know. There are numerous of more or less fantastic ways to work around this particular limitation. Just as there are plenty of ways to work around limitations in any other web development stack. The key thing to remember is that when you select your stack, framework, language or server, you also select all the limitations that comes with it. There’s (still) no silver bullet, even if some bullets are better out of the box than other. Having spent countless of hours with other web development languages, I think I’m in a good position to compare and yes indeed! Node.js delivers some amazing performance. But at present, it comes with a bigger responsibility to ‘Know Your stack’ than a lot of the others.

Permalink 3 Comments

01Feb / 2013

Node.js vs PHP – using Load Impact to visualize node.js efficiency

It could be said that Node.js is the new darling of web server technology. LinkedIn have had very good results with it and there are places on the Internet that will tell you it can cure cancer.

In the mean time, the old work horse language of the Internet, PHP, gets a steady stream of criticism. and among the 14k Google hits for “PHP sucks” (exact term), people will say the most funny terrible things about the language while some of the critique is actually quite well balanced. Node.js introduces at least two new things (for a broader audience). First, the ability to write server side JavaScript code. In theory this could be an advantage since JavaScript is more important than ever on the client side and using the same language on server and browser would have many benefits. That’s at least quite cool.

The other thing that makes Node.js different is that it’s completely asynchronous and event driven. Node is based on the realization that a lot of computer code actually just sits idle and wait for I/O most of the time, like waiting for a file to be written to disk or for a MySQL query to return data. To accomplish that, more or less every single function in Node.js is non-blocking.

When you ask for node to open a file, you don’t wait for it to return. Instead, you tell node what function to pass the results to and get on with executing other statements. This leads to a dramatically different way to structure your code with deeply nested callbacks and anonymous function and closures. You end up with something like this:

doSomething(val, function(err,result){
  doSomethingElse(result,function(err,res){
    doAbra();
    doKadabra(err, res, function() {
      ...
      ...
    });
  });
});

It’s quite easy to end up with very deep nesting that in my opinion sometimes affects code readability in a negative way. But compared to what gets said about PHP, that’s very mild critique. And.. oh! The third thing that is quite different is that in Node.js, you don’t have to use a separate http(s) server. It’s quite common to put Node.js behind a Nginx, but that’s not strictly needed. So the heart of a typical Node.js web application is the implementation of the actual web server.

A fair way to compare

So no, it’s not fair to say that we compare Node.js and PHP. What we really compare is Node.js and PHP+Apache2 (or any other http server). For this article, I’ve used Apache2 and mod_php since it’s by far the most common configuration. Some might say that I’d get much better results if I had used Nginx or Lighthttpd as the http server for PHP. That’s most likely very true, but at the end of the day, server side PHP depends on running in multiple separate processes. Regardless if we create those processes with mod_php or fastcgi or any other mechanism. So, I’m sticking with the standard server setup for PHP and I think that makes good sense.

The testing environment

So we’re pitting PHP+Apache2 against a Node.js based application. To keep things reasonable, I’ve created a very (really, very) simple application in both PHP5 and Node.js. The application will get 50 rows of data from a WordPress installation and output it as a json string. That’s it, nothing more. The benefit of keeping it this simple was (a) that I didn’t have to bother about too many implementation details between the two languages and (b) more important that we’re not testing my ability to code, we’re really testing the difference in architecture between the two. The server we’re using for this test is a virtual server with:

1 x Core Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz
2 Gb RAM.
OS is 64 Bit Ubuntu 12.10 installed fresh before running these tests.
We installed the Load Impact Server metric agent.

For the tests, we’re using:

Apache/2.2.22 and
PHP 5.4.6.
Node.js version 0.8.18 (built using this script)
MySQL is version 5.5.29.
The data table in the tests is the options table from a random WordPress blog.

The scripts we’re using:

Node.js (javascript):

// Include http module,
var http = require('http'),
mysql = require("mysql");

// Create the connection.
// Data is default to new mysql installation and should be changed according to your configuration.
var connection = mysql.createConnection({
   user: "wp",
   password: "****",
   database: "random"
});

// Create the http server.
http.createServer(function (request, response) {
   // Attach listener on end event.
   request.on('end', function () {

      // Query the database.
      connection.query('SELECT * FROM wp_options limit 50;', function (error, rows, fields) {
         response.writeHead(200, {
            'Content-Type': 'text/html'
         });
         // Send data as JSON string.
         // Rows variable holds the result of the query.
         response.end(JSON.stringify(rows));
      });
   });
// Listen on the 8080 port.
}).listen(8080);

PHP code:

<!--?php $db = new PDO('mysql:host=localhost;dbname=*****',     'wp',     '*****'); $all= $db--->query('SELECT * FROM wp_options limit 50;')->fetchAll();
echo json_encode($all);

The PHP script is obviously much shorter, but on the other hand it doesn’t have to implement a full http server either.

Running the tests

The Load Impact test configurations are also very simple, these two scripts are after all typical one trick ponies, so there’s not that much of bells and whistles to use here. To be honest, I was surprised how many concurrent users I had to use in order to bring the difference out into the light. The test scripts had the following parameters:

The ramp up went from 0-500 users in 5 minutes
100% of the traffic comes from one source (Ashburn US)
Server metrics agent enabled

The graphics:

On the below images. the lines have the following meanings:

Green line: Concurrent users
Blue line: Response time
Red line: Server CPU usage

Node.js up to 500 users.

The first graph here shows what happens when we load test the Node.js server. The response time (blue) is pretty much constant all through the test. My back of a napkin analysis of the initial outliers is that they have to do with a cold MySQL cache. Now, have a look at the results from the PHP test:

Quite different results. It’s not easy to see on this screen shot, but the blue lines is initially stable at 320 ms response time up to about 340 active concurrent users. After that, we first see a small increase in response time but after additional active concurrent users are added, the response time eventually goes through the roof completely.

So what’s wrong with PHP/Apache?

Ok, so what we’re looking at is not very surprising, it’s the difference in architecture between the two solutions. Let’s think about what goes on in each case.

When Apache2 serves up the PHP page it leaves the PHP execution to a specific child process. That child process can only handle one PHP request at a time so if there are more requests than than, the others have to wait. On this server, there’s a maximum of 256 clients (MaxClients) configured vs 150 that comes standard. Even if it’s possible to increase MaxClients to well beyond 256, that will in turn give you a problem with internal memory (RAM). At the end, you need to find the correct balance between max nr of concurrent requests and available server resources.

But for Node, it’s easier. First of all, in the calm territory, each request is about 30% faster than for PHP, so in pure performance in this extremely basic setup, Node is quicker. Also going for Node is the fact that everything is in one single process on the server. One process with one active request handling thread. So thre’s no inter process communication between different instances and the ‘mother’ process. Also, per request, Node is much more memory efficient. PHP/Apache needs to have a lot of php and process overhead per concurrent worker/client while Node will share most of it’s memory between the requests.

Also note that in both these tests, CPU load was never a problem. Even if CPU loads varies with concurrent users in both tests it stays below 5% (and yes, I did not just rely on the graph, I checked it on the server as well). (I’ll write a follow up on this article at some point when I can include server memory usage as well). So we haven’t loaded this server into oblivion in any way, we’ve just loaded it hard enough for the PHP/Aapache architecture to start showing some of it’s problems.

So if Node.js is so good…

Well of course. There are challenges with Node, both technical and cultural. On the technical side, the core design idea in Node is to have one process with one thread makes it a bit of a challenge to scale up on a multi core server. You may have already noted that the test machine uses only one core which is an unfair advantage to Node. If it had 2 cores, PHP/Apache would have been able to use that, but for Node to do the same, you have to do some tricks.

On the cultural side, PHP is still “everywhere” and Node is not. So if you decide to go with Node, you need to prepare to do a lot more work yourself, there’s simply nowhere near as many coders, web hotels, computer book authors, world leading CMS’es and what have you. With PHP, you never walk alone.

Conclusion

Hopefully, this shows the inherit differences in two different server technologies. One old trusted and one young and trending. Hopefully it’s apparent that your core technical choices will affect your server performance and in the end, how much load you can take. Designing for high load and high scalability begins early in the process, before the first line of code is ever written.

And sure, in real life, there are numerous of tricks available to reduce the effects seen here. In real life, lots of Facebook still runs on PHP.

Permalink 21 Comments

Load Impact Blog

Tag Archives: node.js