Saturday, November 20, 2010

Taking Baby Steps with Node.js – Threads vs. Events

In a previous blog post, I provided a shallow introduction to Node.js. I also mentioned where you can find more information on how to get it installed on Windows as well as how to install a seemingly popular package manager in the JavaScript community called Npm.

In the mean time, I’ve started to get a more clearer view on the general concepts on which Node.js is based on, as well as the kind of applications that can be built using this server-side platform. The more I read and learn about Node.js, the more I come to the conclusion that it is very much targeted towards building real-time applications. Google Wave, Friendfeed and most recently Facebook are popular examples. You can also read this article to learn more about other examples of real-time web applications.

As I briefly mentioned in the previous blog post, Node.js makes heavy use of JavaScript’s event-based style of programming which lies at the heart of it’s capabilities for building real-time applications. This event-based model is a completely different way of thinking compared to the thread-based model that we’ve been so accustomed to over the past couple of years. ASP.NET Web Services or WCF services for that matter are excellent examples of the thread-based model. Every time a message comes in, these frameworks spawn a new thread or take one from the thread pool in order to handle this request. There’s nothing wrong with this approach. In fact, this thread-based model makes perfect sense for many of the scenarios out there. But generally not for real-time applications that usually require long-lived connections.

In the thread-based model, most of the threads spend a lot of their time being blocked; waiting for I/O operations like executing queries against a database, calling another service or writing to a file on disk. These are expensive operations that usually take longer to complete compared to in-memory operations. When having large amounts of traffic, you can’t afford to have threads blocking for long periods of time. Otherwise you’ll be hitting the maximum number of available threads rather sooner than later.

Node.js solves this by putting the event-based model at its core, using an event loop instead of threads. All these expensive I/O operations that we just talked about are always executed asynchronously with a callback that gets executed when the initiated operation completes. The net result here is that while the I/O operation is busy performing its duties, Node.js is able to accept other incoming requests and start doing the work required to handle these tasks. When the I/O operation completes, the specified callback is executed and the earlier request is further processed. The event loop manages to switch between these requests very fast picking up where it previously left of. This event-based model provides the means for building highly scalable real-time applications.

Let me show you a naive example of this concept so you can get a feel on how this looks in code.

var http = require('http');

http.createServer(function(request, response) {
    var feedUrl = 'http://feeds.feedburner.com/astronomycast.rss';
    var parsedUrl = url.parse(feedUrl);

    var client = http.createClient(80, parsedUrl.hostname);
    var request = client.request(parsedUrl.pathname, { 'host': parsedUrl.hostname });
    request.addListener('response', handle);
    request.end();

    response.sendHeader(200, { 'Content-Type': 'text/html' });
    response.end('Done processing the request.');
}).listen(8124);

function handle(response) {    
    if(response.statusCode !== 200)
        return;
    
    var responseBody = '';
    
    response.addListener('data', function(chunk) {
        responseBody += chunk;
    });
    
    response.addListener('end', function() {       
        console.log('All data has been read.');
        console.log(responseBody);
    });
}

Our server implementation just reads the content of a particular RSS feed every time a request comes in. This code doesn’t do anything useful except illustrating the fact that when we make an HTTP request for an external resource, this HTTP request is fired of asynchronously. We need to subscribe an event listener for when the request completes and in order to read the requested data from the HTTP response. In the mean time, Node.js takes one other requests, firing new HTTP requests and going its merry way. Notice that even reading in the chunks of data from an HTTP response is done asynchronously!

This isn’t very different compared to performing Ajax requests in a browser, now is it? Take a look at the following jQuery snippet and notice how similar it looks with the code for performing an HTTP request in our server-side example.

$.getJSON('http://myfancywebsite.com/something', function(data, status) {
    // Handles the data from the response
});

Earlier this week, I was listening to this excellent episode of Herding Code on Manos de Mono. This is a high performance web application framework that targets the .NET platform, Mono in particular. I know there are new web frameworks popping out of the ground like mushrooms every day. But what particularly excites me about this one is that it’s based on the same high-performance event loop as Node.js, which is called libev. I have to admit that I haven’t heard about this before the Herding Code episode, but I’m definitely looking forward spending some time on it as well.

As I mentioned before, I’m just learning about this stuff so I’m happy to get your feedback, thoughts, etc … . Till next time. 

1 comment:

florisla said...

Fascinating stuff.