Tuesday, November 30, 2010

AutoTest.NET

I just want to quickly point out a tool that I’ve been playing with for a couple of days now, named AutoTest.NET. Its an open-source tool that originates from a popular tool in the Ruby community called ZenTest, which basically runs all your valuable unit tests when you save your source files or when you build your code. It enables you to get feedback about your changes as soon as possible.

The project started out a couple of years ago on Google code and was first initiated by James Avery. Contribution stopped at some point until recently where Svein Arne Ackenhausen forked the source code and put it on GitHub. Now it runs both for .NET as well as Mono, with NUnit, MSTest and xUnit as the currently supported unit test frameworks.

Here’s a screenshot from the feedback window when all tests pass:

Success

And here you can see the same window after I broke one of my unit tests:

Failure

Here you can see which particular unit test has been broken and by clicking the specified link you end up at the right source file in Visual Studio.

I encourage you to pick up this small tool and learn how it can facilitate your TDD flow. Take a look at this page in order to get up-and-running in no time. Also don’t forget to provide the owner(s) of this project with some feedback that you might have. Svein has been very helpful over the last week answering all my stupid questions and remarks (and the Nobel price for this year’s most patient person goes to … ;-) ).

I would definitely like to see this tool becoming more popular, so go check it out.  

Friday, November 26, 2010

Book Review - The Nomadic Developer

Just before I recently decided to turn a new page in my professional career, I came across this book called The Nomadic Developer: Surviving and Thriving in the World of Technology Consulting. Before I made up my mind about being a consultant, I purchased this book and read a few parts of it (specifically chapter five “What You Need To Ask Before You Join a Technology Consulting Firm” and chapter ten “Is Consulting Right for You?”). I recently read the book from cover to cover in order to find out more about what it means to work for a consulting company. So far, I’ve only been working as an in-house developer for companies that aren’t in the software business itself. This basically means that I’ve been working to support and find solutions for the core business of the company, where software is just a means to an end. Companies tend to value the people that bring in the most revenue.  In a lot of cases, an in-house developer is usually seen as a cost instead of someone who brings in revenue.

As a consultant working for a technology company, software is the core business which is a slightly different ball game compared to being an in-house developer. I use the word ‘slightly’ here because as a consultant, there’s a high chance you’ll probably end up developing software to support businesses of other companies that aren’t also in the technology business. But still, I think that working for a company that makes money in your own field of expertise is a good place to be.

Anyway, this book is targeted towards helping consultants (or future consultants like myself) to understand the economics of the technology industry. It’s filled with great advice for all folks that are new in the IT industry, coming from college or otherwise. But for more seasoned developers, there are some great nuggets in there as well. Although the book is mainly focused towards consultants, lots of the advice in the book is also applicable to in-house developers.

Take a look at the twelve chapters to get a quick glimpse of the content. The wisdom collected here is drawn from the author’s own experience as well as from others that contributed content to the book like Jason Bock, Michael Hugos, Derik Whittacker, Chris G. Williams, … etc. The content of the book is interspersed with annotations and anecdotes of the contributors which makes it highly interesting and fun to read. The last chapter is an enumeration of essays from the contributors. These essays all promote doing community work in one way or another like working on open-source projects, writing books, organizing user group meetings, etc for gaining more visibility as a consultant. While I’m certainly highly in favor of doing these things, doing them just for your own advancement or sales purposes is definitely the wrong reason. In my opinion, the author could have done a better job to warn the reader that contributing to a community only for visibility reasons isn’t the right perspective for doing these kind of things as it tends to do more harm than good and probably won’t last either. Again, putting out advice to do community work in order to gain visibility is one thing, warning about the dark side of the coin couldn’t hurt either.

So if you are working as a consultant or thinking about becoming one, just pick up this book and give it read. It’s also available in audio format if you fancy that. I learned a couple of things from this book and probably so will you.

Saturday, November 20, 2010

Taking Baby Steps with Node.js – Threads vs. Events

In a previous blog post, I provided a shallow introduction to Node.js. I also mentioned where you can find more information on how to get it installed on Windows as well as how to install a seemingly popular package manager in the JavaScript community called Npm.

In the mean time, I’ve started to get a more clearer view on the general concepts on which Node.js is based on, as well as the kind of applications that can be built using this server-side platform. The more I read and learn about Node.js, the more I come to the conclusion that it is very much targeted towards building real-time applications. Google Wave, Friendfeed and most recently Facebook are popular examples. You can also read this article to learn more about other examples of real-time web applications.

As I briefly mentioned in the previous blog post, Node.js makes heavy use of JavaScript’s event-based style of programming which lies at the heart of it’s capabilities for building real-time applications. This event-based model is a completely different way of thinking compared to the thread-based model that we’ve been so accustomed to over the past couple of years. ASP.NET Web Services or WCF services for that matter are excellent examples of the thread-based model. Every time a message comes in, these frameworks spawn a new thread or take one from the thread pool in order to handle this request. There’s nothing wrong with this approach. In fact, this thread-based model makes perfect sense for many of the scenarios out there. But generally not for real-time applications that usually require long-lived connections.

In the thread-based model, most of the threads spend a lot of their time being blocked; waiting for I/O operations like executing queries against a database, calling another service or writing to a file on disk. These are expensive operations that usually take longer to complete compared to in-memory operations. When having large amounts of traffic, you can’t afford to have threads blocking for long periods of time. Otherwise you’ll be hitting the maximum number of available threads rather sooner than later.

Node.js solves this by putting the event-based model at its core, using an event loop instead of threads. All these expensive I/O operations that we just talked about are always executed asynchronously with a callback that gets executed when the initiated operation completes. The net result here is that while the I/O operation is busy performing its duties, Node.js is able to accept other incoming requests and start doing the work required to handle these tasks. When the I/O operation completes, the specified callback is executed and the earlier request is further processed. The event loop manages to switch between these requests very fast picking up where it previously left of. This event-based model provides the means for building highly scalable real-time applications.

Let me show you a naive example of this concept so you can get a feel on how this looks in code.

var http = require('http');

http.createServer(function(request, response) {
    var feedUrl = 'http://feeds.feedburner.com/astronomycast.rss';
    var parsedUrl = url.parse(feedUrl);

    var client = http.createClient(80, parsedUrl.hostname);
    var request = client.request(parsedUrl.pathname, { 'host': parsedUrl.hostname });
    request.addListener('response', handle);
    request.end();

    response.sendHeader(200, { 'Content-Type': 'text/html' });
    response.end('Done processing the request.');
}).listen(8124);

function handle(response) {    
    if(response.statusCode !== 200)
        return;
    
    var responseBody = '';
    
    response.addListener('data', function(chunk) {
        responseBody += chunk;
    });
    
    response.addListener('end', function() {       
        console.log('All data has been read.');
        console.log(responseBody);
    });
}

Our server implementation just reads the content of a particular RSS feed every time a request comes in. This code doesn’t do anything useful except illustrating the fact that when we make an HTTP request for an external resource, this HTTP request is fired of asynchronously. We need to subscribe an event listener for when the request completes and in order to read the requested data from the HTTP response. In the mean time, Node.js takes one other requests, firing new HTTP requests and going its merry way. Notice that even reading in the chunks of data from an HTTP response is done asynchronously!

This isn’t very different compared to performing Ajax requests in a browser, now is it? Take a look at the following jQuery snippet and notice how similar it looks with the code for performing an HTTP request in our server-side example.

$.getJSON('http://myfancywebsite.com/something', function(data, status) {
    // Handles the data from the response
});

Earlier this week, I was listening to this excellent episode of Herding Code on Manos de Mono. This is a high performance web application framework that targets the .NET platform, Mono in particular. I know there are new web frameworks popping out of the ground like mushrooms every day. But what particularly excites me about this one is that it’s based on the same high-performance event loop as Node.js, which is called libev. I have to admit that I haven’t heard about this before the Herding Code episode, but I’m definitely looking forward spending some time on it as well.

As I mentioned before, I’m just learning about this stuff so I’m happy to get your feedback, thoughts, etc … . Till next time. 

Wednesday, November 17, 2010

The Black Art of P/Invoke and Marshaling in .NET

Last week I finally managed to hunt down and resolve a bug that I had been chasing for quite some time. A couple of years ago I built an ASP.NET web service that makes use of a native library to provide most of its functionality. This native DLL exposes a C API, much like the Win32 API, that we’ve been using for integrating a highly-expensive product from a certain vendor into our system.

I didn’t notice any issues during the initial development of this web service, in fact, I was very pleased with how easy it was to use and integrate this API. Until this very day, I still consider this as one of the nicest and most stable C API’s I’ve ever come across. After I finished most of the features for the web service, I ran a couple of load tests in order to determine whether the respective product could withstand a fair amount of requests and also to detect any glitches that might come up in the interop layer. Nothing major turned up, so we decided to bring this service into production and go on with out merry lives.

Aft first we didn’t receive any complaints. Everything worked out as expected, until earlier this year, we noticed some failures of the application pool for our web service in IIS. The event log showed some random crashes of the CLR (= very bad) with some highly obscure error message. These issues occurred completely random. One day, there were about five to six application pool crashes after which it behaved very stable again, sometimes for months in a row.

After doing some investigation on my part, which also involved stretching my shallow knowledge of WinDbg, I found out that .NET runtime was doing a complete shutdown after an AccessViolationException being thrown. This exception reported the following issue:

Attempted to read or write protected memory. This is often an indication that other memory is corrupt.

The first thing I considered was a memory leak turning up somewhere. It’s unmanaged code after all, right? After reviewing the code that uses the imported native functions, I discovered two potential memory leaks that might come up during some edge case scenarios. So I fixed those, but unfortunately it didn’t resolve the problem.

I tried a couple of other things and also approached the issue from a couple of different angles. I’m not going to bore you with the details here, but it clearly didn’t solve anything. I started to become a bit frustrated with this problem. At some point, I was even convinced that this was all caused by the native library itself and not in our code, which is a classic mistake in most long debugging sessions. The good thing was that I was able to reproduce the problem on my machine, by running the load tests with a particular high load. I noticed that there was also a certain pattern in the number of requests between each crash of the application pool.

Vastly determined to fix this, I decided to read up on some information about P/Invoke and marshaling in .NET. I ended up reading this blog post on P/Invoke and memory related issues. While it didn’t provide a clear solution, reading that blog post certainly guided me in the right direction. I started to turn off feature by feature until I was able to isolate the cause of the crash to a single function call. The native function in question has the following signature:

char* FunctionThatReturnsAString();

As it turned out, I created the following P/Invoke prototype for this native function in C#:

[DllImport("SomeProduct.dll")]
public static extern String FunctionThatReturnsAString();

At first sight (and even after a number of subsequent reviews), there seems to be nothing wrong with the way the char * return value is marshaled to a String. I started reading the excellent documentation for this function again, and it explicitly mentioned that the caller should not cleanup the memory for the return value. The allocated memory for the return value is always cleaned up by the native function itself.

Doing some more research on marshaling strings, I found out that in case the native function does its own cleanup, an IntPtr must be returned by the C# prototype declaration instead of a String.

Most c-style strings returned by native functions must be cleaned up by the calling code. For this common scenario, when the return value is marshaled to a String, the interop marshaler assumes that it has to free the unmanaged memory returned by the function. In our case, this actually means that the interop marshaler tried to cleanup memory that at some point was already freed up by the native function or vice versa.  

So I changed the return value for the P/Invoke prototype declaration to an IntPtr. This way, the interop marshaler does not automatically free the memory that is referenced by the IntPtr.

[DllImport("SomeProduct.dll")]
public static extern IntPtr FunctionThatReturnsAString();

Marshaling the data referenced by the IntPtr to a string is quite easy. This can be done using the Marshal.PtrToStringAuto method.

The lesson I learned here is that we need to watch out for these kind of issues when using interop with unmanaged code. On the surface everything seems to work out just fine, but one can still run into some nasty issues afterwards. Carefully considering how to correctly marshal data from and to unmanaged code is an essential technique and sometimes feels like a black art that is not suited for the faint of heart, like myself :-).

Saturday, November 13, 2010

Basic JavaScript Part 2 : Objects

In a previous blog post, I showed some of the rich capabilities of functions in JavaScript. For this post I want to have a brief look at objects in JavaScript. Let’s start with a basic example:

var podcast = {
    title: 'Astronomy Cast',
    description: 'A fact-based journey through the galaxy.',
    link: 'http://www.astronomycast.com'
};

When you’ve already seen some kind of data in JSON format, this might look familiar.  In JavaScript, this is called an object literal. The example above shows a variable named podcast that contains an object with three properties, named title, description and link. As you might expect, it’s now possible to access the data of these properties using the regular dot notation.

console.log(podcast.title);
console.log(podcast.description);
console.log(podcast.link);

Hopefully there are no big surprises here. But what I find to be more interesting about objects in JavaScript are their similarities with arrays. Take a look at the following array definition:

var astronomyPodcast = ['Astronomy Cast', 'A fact based ...', 'http://www.astro ...'];

When I want to access the description in the array, I simply go by using the correct index:

console.log(astronomy[1]);    // outputs the description

No big deal so far. The nice thing about objects in JavaScript is that we can use the same notation but only with a small difference:

console.log(podcast['description']);

The sole difference here is that we are using the name of the property instead of a numeric index. Objects actually behave the same as arrays in JavaScript except that for objects you get to choose a nice and friendly name for its properties. Lets look at another example that  loops over every item in the array:

for(var i in astronomy) {
    console.log(astronomy[i]);
}

This outputs every value contained by the array. The variable i contains the current index in the array. We can do the same for objects:

for(var i in podcast) {
    console.log(podcast[i]);
}

The variable i doesn’t contain a numeric value in this case, but the name of the current property. So in short, a JavaScript object is very similar to an array with the sole difference that you have to define the keys yourself. Just a quick side note here: try to avoid for-in loops with arrays and use a regular for loop instead.

Methods

A method on an object is simply a property that contains a function.

var podcast = {
    title: 'Astronomy Cast',
    description: 'A fact-based journey through the galaxy.',
    link: 'http://www.astronomycast.com',

    toString: function() {
       return 'Title: ' + this.title;
    }
};

Again, calling a method can be done using the dot notation or the square bracket notation as we saw earlier:

podcast.toString();   OR 
podcast['toString']();

I have to admit that this last notation looks a bit weird and I don’t think it’s a common practice.

Constructors

Instead of using object literals, the most common way I’ve seen so far for creating an object in JavaScript is by using a so called constructor function. A constructor function is just a regular function but with a slightly different naming convention. The name of a constructor function is generally defined using Pascal casing as opposed to the usual Camel casing (like the toString method as shown earlier).

function Podcast() {
    this.title = 'Astronomy Cast';
    this.description = 'A fact-based journey through the galaxy.';
    this.link = 'http://www.astronomycast.com';

    this.toString: function() {
       return 'Title: ' + this.title;
    }
}

In order to create a Podcast object, we simply use the new operator:

var podcast = new Podcast();
podcast.toString();

When a constructor function is invoked with the new operator, an object is always returned. By default, this points to the created object. The properties and methods are added to the object (referenced by this) after which the new object is returned. Because a constructor function is only a convention, you can also pass arguments to it just as with regular functions.

When an object is created using a constructor function, there’s also a property named constructor that is set with a reference to the constructor function that was used for creating the respective object.

Encapsulation

In the example we’ve been using so far, it’s fairly easy to replace property values and method implementations of a Podcast object.

var podcast = new Podcast();
podcast.title = 'The Simpsons';
podcast.toString = function() {
    return 'Doh';
}

Suppose we want our Podcast objects to be immutable. Unfortunately, JavaScript doesn’t have a notation for private, protected, public or internal methods like C# does. So, if we want to hide the values of the properties on this object, we have to refactor them to regular variables and make use of closures (these are explained in the previous post).

function Podcast() {
    var _title = 'Astronomy Cast';
    var _description = 'A fact-based journey through the galaxy.';
    var _link = 'http://www.astronomycast.com';

    this.getTitle = function() {
        return _title;
    }

    this.getDescription = function() {
        return _description;
    }

    this.getLink = function() {
        return _link;
    }

    this.toString: function() {
       return 'Title: ' + _title;
    }
}

The ‘public’ methods have access to the ‘private’ variables while the latter are not exposed to the externals of the object. These public methods that have access to the private members are also called privileged methods.

var podcast = new Podcast();
console.log(podcast.getTitle());
console.log(podcast._title);    // undefined

You probably want to watch our for these privileged methods returning a private member that holds either an object or an array. Because these are passed by reference, the client code can still change the private member. To prevent this from happening, you might want to consider returning a copy of the object or array.

We can have private functions as well using the same approach as for private variables:

function Podcast() { ... // Private function function download() { ... }

 

function reliesOnDownload() {

...

download();

...

} ... }

Suppose we want to make the download method publicly available. But on the other hand, we also have some other methods in our Podcast object (like the reliesOnDownload method) that make use of the download method, relying on its robust functionality. So making this method publicly available can jeopardize the correct working of these other methods if some client decides to replace our download method with its own buggy implementation or even worse, deletes it completely. We can’t have that, of course.

We mentioned earlier that constructor functions implicitly return this. But we can return our own custom object as well and we can use this to solve our problem of elevating private methods.    

function Podcast() {
    var _title = 'Astronomy Cast';
    var _description = 'A fact-based journey through the galaxy.';
    var _link = 'http://www.astronomycast.com';

    function getTitle() {
        return _title;
    }

    function getDescription() {
        return _description;
    }

    function getLink() {
        return _link;
    }

    function toString() {
        // Now we can safely rely on the getTitle() accessor as well.
        return 'Title: ' + getTitle();    
    }

    function download() {
        // Some highly resilient implementation ...
    }   
    function reliesOnDownload() {
       // Relies on our own implementation of the download() method
       download(); 
       
       ...   }

    return {
        getTitle: getTitle,
        getDescription: getDescription,
        getLink: getLink,
        toString: toString,
        download: download 
    }
}

The net result here is that we safely exposed our private methods to any outside code. Clients can still replace the download method on the custom object that we explicitly returned from the constructor function, but at the very least we can safely rely on our own implementation.

Closing

There you go, just some quick trivia on objects in JavaScript. I cannot emphasize enough how powerful JavaScript really is and that learning about this great programming language is a tremendous experience.

Monday, November 08, 2010

Taking Baby Steps with Node.js – Introduction

Like myself, you might have read an article somewhere about Node.js or heard it mentioned a couple of times during some talk. So I actually got curious and decided to start learning more about it.

So what is Node.js? I have to admit that I’m still trying to wrap my head around it. As far as I can tell, Node.js is an attempt (amongst others) to make JavaScript available on the server. Most people see JavaScript as the programming language for doing client-side scripting in a browser. Although this is the most commonly used scenario, things are slowly starting to change. Over the past couple of years, JavaScript (re)gained its popularity due to some innovative frameworks like jQuery, MooTools, ExtJS and others. Some NoSQL databases like CouchDB also started to make some good use of JavaScript on the database itself. A nice example of this are MapReduce functions that are written entirely using JavaScript.  Therefore JavaScript on the server-side seems like a logical evolution for the language and its community. But there’s more.

Node.js makes use of the strengths of JavaScript like its excellent capabilities for doing event-based programming. If you’re used to working with DOM events in the browser, then Node.js will make you feel right at home. All JavaScript code on the server is run by the V8 JavaScript engine (yes, the one from the Google Chrome fame) and is also run in parallel, which makes Node.js very fast (although they claim).

There seem to be similar frameworks like Node.js on other development platforms as well like Twisted  (Python), Jetty (Java) and EventMachine (Ruby). Node.js only runs on Mac OS X and other Unix-based systems like Linux. Windows is currently not supported, but you can work around this by installing Cygwin. If you don’t care about firing up Linux on a virtual machine, then you can follow the steps laid out by Matthew Podwysocki in his blog post on how to get started with Node.js on Windows.

There’s one thing that I feel is missing from Matthew’s post that I urge you to install as well. It’s a small tool called Npm. Npm is a package manager for installing non-standard modules for Node.js. Node.js already comes with a number of built-in modules for accessing the network or the file system, but there are a whole slew of other modules on GitHub or the Npm repository. The vast amount of open-source modules clearly proves that the Node.js community is a vibrant one and  that they’ve been very busy providing more capabilities to the platform.

In order to install Npm, you first need to enable curl in Cygwin. If you already installed Cygwin and forgot to install this package, don’t worry. Just run the Cygwin installer again and select the required package. The packages you selected in a previous install will not be removed unless you explicitly unselect them from the list.   

Now to install Npm simply execute the following command:

curl http://npmjs.org/install.sh | sh

There you go. Now you’re able to install as many open-source modules for Node.js as you like. The first one I’ve installed is a web development framework called Express. Getting this module is as simple as executing the following command:

$ npm install express

That’s it. As I mentioned earlier, I’m still figuring out this stuff myself. I’m more than happy to get some feedback and hear you thoughts about Node.js. I’ll probably be writing a couple of more blog posts about Node.js in the future as I learn more along the way.

Code Retreat Ghent

Last Saturday I got up very early in the morning so I could spend the whole day at the Code Retreat in Ghent. This Code Retreat was organized by AGILEMinds and facilitated by Corey Haines. You might ask yourself what a Code Retreat is.

A Code Retreat is a bunch of software developers who get together and practice their craft, share ideas and basically try to learn and improve. There are six sessions that last 45 minutes. For each session, one has to pair up with another participant and work on a problem called Conway’s Game of Life. All pairs decide for themselves which programming language and/or platform they want to use, as long as they don’t waste the majority of their time setting things up. There are a couple of rules though. First of all, regardless of the technology used, applying Test-Driven Development (TDD) is a must. You also can’t pair with the same person more than once. At the end of each session all code must be deleted after which a short group retrospective is being held.  

There was quite a nice turn-out for the Code Retreat in Ghent despite the fact that folks were spending their Saturday with a bunch of other geeks :-). I was also pleasantly surprised to see so many .NET developers amongst the attendees. There were also a number of Java and Ruby developers as well. It turned out to be quite a good mix.

The first two sessions of the day were more about getting familiar with the problem while trying to implement a solution. Gradually I got a better understanding of the problem and also had a couple of “ah-ha” moments throughout the day (even during the last session).

I learned a lot from watching how other people write unit tests, implement code and how they approached the problem. Throwing it all away after each session and starting out fresh was extremely liberating. We used C# for most of the sessions except for one session where we  actually used Ruby. It probably sounds like a cliché by now but using Ruby really opened my eyes on how concise this programming language is compared to static languages like C# or Java. Going back to using C# on the next session was a bit hard to do. I was amazed by the amount of cruft we have to type just to satisfy the compiler. A fair amount of people came to the same conclusion during the final retrospective after the last session, so I guess there is some truth in that.

While I have been doing test-driven development for a good number of years now, I learned that I really need to focus on letting my unit tests drive out the design of the system. I usually have a design in mind before I start writing tests but I should practice more on how to let the design emerge. There is definitely some ground for me to cover on that particular issue.

I also noticed that the 45 minutes of a session just fly by. It’s a very short time. Therefore, the last session lasted for about an hour and 15 minutes. During this period of time we managed to get closer to solving the problem at hand, but it still was not enough time. How time can fly by when you’re enjoying yourself ;-).

I had a total blast and got more out of it than I initially expected, which is always a good thing. If you ever get the chance to be part of a Code Retreat, then I recommend you take that opportunity and spend a day coding, practicing, communicating with a other developers. It’s totally worth a day of your weekend regardless of how scary that may sound.

You can take a look at the pictures of last Saturday to get a sense of the atmosphere. AGILEMinds also announced that there will be another Code Retreat in Belgium on January 16th 2011 with a live hand-over from the Code Retreat with Corey Haines in Cleveland.

Kudos to AGILEMinds and Corey Haines for making this happen.