Saturday, May 30, 2009

Installing the Couch(DB)

In my previous post, I talked about some introductory topics regarding CouchDB. In this post I want to walk you through some of the hurdles you need to take when you want to install CouchDB on a freshly installed Ubuntu Linux 9.0.4. Now I don't want to claim that there isn't a simpler way for getting CouchDB up and running, which probably should be the case. I just want to put out everything I had to deal with when I tried installing the trunk version (V9.0) of CouchDB. I didn't install the pre-packaged version (V8.02) as that would be too easy :-). Here goes ...

1. Prepare your environment:  

sudo aptitude install automake autoconf libtool subversion-tools help2man spidermonkey-bin build-essential
erlang erlang-manpages libicu38 libicu-dev 
libreadline5-dev checkinstall libmozjs-dev wget

 

sudo apt-get install libcurl4-gnutls-dev

2. Create a new user for running CouchDB:

sudo adduser --no-create-home --disabled-password --disabled-login couchdb

3. Get the latest source code from the trunk:

svn co http://svn.apache.org/repos/asf/couchdb/trunk couchdb

4. Build and install: 

./bootstrap
./configure --bindir=/usr/bin --sbindir=/usr/sbin 
    --localstatedir=/var --sysconfdir=/etc
make && sudo make install

5. Make the CouchDB user, that was created in step 2, the owner of the installed binaries:

sudo chown couchdb:couchdb -R /var/lib/couchdb /var/log/couchdb

6. Now we're all set to launch the CouchDB server:

sudo -i -u couchdb couchdb

If everything goes well, you should see something similar as the following output:

sudo: unable to change directory to /home/couchdb: No such file or directory
Apache CouchDB 0.10.0a780291 (LogLevel=info) is starting.
Apache CouchDB has started. Time to relax.
[info] [<0.1.0>] Apache CouchDB has started.

7. In order to verify that CouchDB is up and running, you can install cURL and issue the following command:

sudo apt-get install curl
curl http://localhost:5984

This should give you the following response:

{"couchdb":"Welcome","version":"0.10.0a780291"}

8. [Optional] Register  CouchDB as a service:

sudo update-rc.d couchdb defaults
sudo /etc/init.d/couchdb start

So far in our setup, we can only access CouchDB from the local Linux (virtual) machine. My personal goal was to access CouchDB from a sample application that I'm developing on another Windows virtual machine that hosts my development environment. In order to accomplish that, some additional steps need to be taken:

1. Open the correct port on the firewall when enabled. You can do this by issuing the following:

sudo iptables -I INPUT 3 -p tcp --dport 5984 -j ACCEPT

2. Go to the local.ini file in the /etc/couchdb/ directory and uncomment the line that specifies the bind_address setting (in the httpd section) by removing the semicolon in front of it (figuring out this one kept me busy for quite some time ;-) ).

3. Change the bind_address setting so that it now specifies 0.0.0.0 instead of 127.0.0.0. After restarting CouchDB, it will now bind to all addresses.

4. We're done.

As I mentioned before, I'm quite a Linux rookie so it was definitely a challenge figuring this out.

If this all seems too much for you, then you might be interested to take a look at Interactive CouchDB. This is an online CouchDB emulator/visualizer that was completely written in JavaScript. It provides a sample DB where can play around with some map/reduce functions which I'm going to discuss in one of my next posts.

Now I need some relaxation. Till next time.

Sunday, May 24, 2009

Relaxing on the Couch(DB)

The last couple of months, I heard some buzz around CouchDB at several user groups. Listening to this podcast really got me interested, so I decided to learn more about it in order to find out what all the fuzz is about.

couchdb-logo

Is CouchDB yet another relational database? Heck no! CouchDB is an open-source document-oriented database. Its originally created by Damien Katz and developed using Erlang. The data stored in CouchDB cannot be accessed using traditional SQL but through a RESTful API. CouchDB can be installed on most POSIX systems, including Linux and Mac OS. It can also be installed on Windows, but this isn't officially supported and according to this page on the wiki, not all IO-related features are fully functional. But don't let that scare you away. It can be fun to spend some time with Ubuntu :-). The final goal of CouchDB is to provide a scalable, reliable and fault-tolerant document database that can run on failure prone systems.

So, what's the difference between a relational database and a document-oriented database and why would you care? As you probably know, the center of the RDBMS world is the row. It may not come to a surprise that the centerpiece of a document-oriented database is the document.

A document in a document-oriented database is completely self-contained. There are no tables, rows, foreign keys, joins and more importantly, there is no database schema at all! You can store whatever data you'd like for a particular document and use a completely different set of data for another. Once you have a set of documents stored, you can add/remove data of one or more documents without affecting others.

For example, when your application wants to store customer information like the first and last name, then these attributes can be stored in a document for every customer. As documents in CouchDB are stored as JSON, a customer document would look like this:

{ "_id" : "...", "_rev" : "...", "FirstName" : "Chuck", "LastName" : "Norris" }

Suppose you application has been in production for almost a year now and by celebrating this fact, the shoe size of the customer must be stored as well.

Simple. You don't have to change any schema or anything (and face those nasty DBA's). You just start saving the valuable shoe size information for every new/existing customer from now on.

{ "_id" : "...", "_rev" : "...", "FirstName" : "Chuck", "LastName" : "Norris" "ShoeSize" : "52" }

You probably want to update the already existing documents as soon as you can get a hold of this extra information, but the point is that you don't have to.

Another major difference between a relational database and a document-oriented database are the unique identifiers. A table in a RDBMS typically has a primary key in order to identify a row. The value of the primary key column is mostly generated by some sequence generator. This means that the identifier of a row is only unique for the table itself. CouchDB on the other hand, provides a document identifier that must be unique across multiple databases. It has to be unique on a global scale because in order to achieve high availability, CouchDB has a replication feature built in so that it's possible to replicate with multiple database instances across multiple machines. The safest way for uniquely identifying a document is to provide a GUID or let CouchDB generate a unique randomly generated identifier for you.

Another interesting feature of CouchDB is the way it handles versioning. But first lets talk a bit how you would traditionally solve this using a relational database.

In order to achieve optimistic-concurrency in a RDBMS, you have to include a version column that you have to update with every row change and use it in the WHERE clause of the update statement involved. But what if you want to know what got changed to a particular row last month? In order to store that kind of information, you have to create a separate table where you insert a snapshot of the row for every update statement that gets executed on the table (probably using a trigger). The point that I'm trying to make here is that a relational database is not going to help you with that. You have to provide all this plumbing yourself.

As you may have noticed from the customer example above, there are two extra fields _id and _rev. Every document in CouchDB contains this extra meta-data. We just discussed that the document identifier is stored in the _id field. The revision of the document is stored in the _rev field. When you create a new document, this revision field will be filled by CouchDB and returned to you. Every change from that moment on will not be made to the current existing document, but a new version of the entire document is created. This means that the complete history of a document is automatically captured and maintained by the database. The revision number is also used for implementing concurrency. When two clients want to make changes to the same document, the first client will be able to store his changes while the second one will receive a notification about the conflict.

That's all fine and sweet. But what's the point of having a document database filled with all the information I want, but not having SQL at my disposal to get it back out?

CouchDB relies on views to retrieve all the information you want out of your documents. These views are computed using Map/Reduce, a way of dealing with large sets of data using parallelism. This concept originally comes from the Lisp programming language, but is popularized by Google. You can read the paper here.

I'm planning to write a couple of blog posts in the near future about the things I've learned so far and the new things I'm going to learn about CouchDB along the way. In the meanwhile, if you can't wait to learn more, then check out these forthcoming books:

I've only read the MEAP of CouchDB in Action which is only about 30 pages so far, but looks very promising.

For my next post on CouchDB, I will try to make a summary of the steps I have gone through for installing CouchDB on a clean Ubuntu Linux installation.

Monday, May 18, 2009

Half-a-Book Review: IronRuby in Action

carrero_cover150Earlier this year, Ivan Porto Carrero did a quite enjoyable presentation on IronRuby for the Dutch ALT.NET user group. I recently picked up the early access edition of his book IronRuby in Action, which was a quite challenging read for me. 

I must admit, I'm a complete noob when it comes to the Ruby programming language. I've been very interested in everything it has to offer, but I never learned or used it so far. I'm vastly determined to do something about this, and that's why I read this book.

At the moment of this writing, only five chapters of the book are available. The first chapter provides an overall view of dynamic languages, explaining the differences, advantages and disadvantages of static languages versus dynamic languages. This opening chapter also provides a step-by-step guide of how to get IronRuby up-and-running on your machine as well as an explanation of the importance of unit testing for code written in a dynamic language. This is something that can't be emphasized enough during talks or discussions about dynamic languages, but which is nicely covered by the author.

The second chapter provides a crash course of the Ruby programming language, where the major aspects of the language itself are briefly touched and explained. Although the author tried very hard to explain as much as possible in this single chapter, I felt that I needed to pick up another introductory text on Ruby and that trying things out on a slower pace would be more beneficial for me (remember, I'm a Ruby noob).

The third chapter provides a nice overview of how to host the Dynamic Language Runtime in your own applications. This chapter also provides a quick overview of how the hot technologies du jour (WPF, Silverlight and ASP.NET MVC) can be used with IronRuby. The author even got me interested in WPF for a moment there (don't worry, its over now ;-) ). Take a look at the following WPF code using meta-programming in IronRuby:

# build the window obj = Wpf.build Window, :title => title, :height => 500, :width => 826, :name => "Biffy" do add DockPanel, :name => "dock_panel" do add TextBlock, :text => title, :font_size => 36, :background => :alice_blue, :dock => :top, :name => "text_block" add StackPanel, :orientation => :horizontal, :dock => :top, :name => "stack_panel" do add TextBox, :text => start_url, :width => 750, :name => "web_url" add Button, :content => "Show site", :name => "get_url_button" end add Frame, :source => start_url, :name => 'web_page_display' end end

Look ma, no XAML! Looks good, doesn't it?

Chapter 4 provides a detailed step-by-step implementation of a Twitter client that is built using WPF. Chapter 10 talks about using the Rails framework with IronRuby. First a detailed explanation is given on how to install Rails on your machine, after which another step-by-step implementation is provided for a server side Twitter application.

After I finished reading these chapters, I realized that I need to pick up some more introductory material on Ruby, like the Pickaxe book or the Why's (Poignant) Guide to Ruby. But one thing's for sure, the author has a lot of experience with Ruby and that is reflected throughout this book. I'm already looking forward to read the other upcoming chapters.

Sunday, May 17, 2009

Next European VAN on 1st June 2009

This time, Udi Dahan will be on the next European VAN answering all your questions about DDD, SOA, CQS, Messaging, NServiceBus, ... If you have questions, please post them on these threads that Colin started on the DDD or ALT.NET groups. Make sure that you post your questions before the 25th May, because then Colin and yours truly will aggregate them (and chiming in our own ;-) ) and send them to Udi so that he can get things prepared on his end.

Here are the details:

Start Time: Monday, June 01, 2009 07:00 PM GMT

End Time: Monday, June 01, 2009 08:30 PM GMT

Attendee URL: http://snipr.com/virtualaltnet (Live Meeting)

VAN Calender: http://www.virtualaltnet.com/van/Home/Calendar

Needless to say that I'm already looking forward to this. Make sure you don't miss this excellent opportunity to learn more about scalable and maintainable architectures.

Thursday, May 07, 2009

Recording of Mark Nijhof on Fubu MVC @ E-VAN 06 May 2009

Mark Nijhof did a great job explaining some of the concepts of FubuMVC and showed some of the code from the sample applications of FubuMVC Contrib (including his own blog). If you missed this great session, you can now watch the recording.

Mark Nijhof on Fubu MVC @ E-VAN 06 May 2009

The details of the next E-VAN meeting will be posted soon.