Saturday, July 25, 2009

Generic Expression Builder

I blogged about fluent interfaces and expression builders a couple of times before. For this post, I want to share a base class that I've been using for taking away some of the burden when creating expression builders for domain classes.

Here's the expression I'm after:

var document = DocumentBuilder.BuildDocument()
    .AuthoredBy("Stephen Hawking")
    .Titled("The Universe in a Nutshell")
    .TaggedWith(tag => tag.Named("Physics"))
    .Build();

This creates an instance of a domain class named Document with the name of the author, its title and with an associated tag. Let me first show the code of the expression builders that make this happen.

public interface IDocumentAuthorBuilder
{
    IDocumentTitleBuilder AuthoredBy(String author);
}

public interface IDocumentTitleBuilder
{
    IDocumentTagBuilder Titled(String title);
}

public interface IDocumentTagBuilder : IBuilder<Document>
{
    IDocumentTagBuilder TaggedWith(Action<ITagBuilder> 
                                                buildUsing);
}

public class DocumentBuilder : Builder<Document>,
                               IDocumentAuthorBuilder,
                               IDocumentTitleBuilder,
                               IDocumentTagBuilder
{
    private DocumentBuilder()
    {}

    public static IDocumentAuthorBuilder BuildDocument()
    {
        return new DocumentBuilder();
    }

    IDocumentTitleBuilder AuthoredBy(String author)
    {
        ProvideValueFor(document => document.Author, author);
        return this;
    }

    IDocumentTagBuilder Titled(String title)
    {
        ProvideValueFor(document => document.Title, title);
        return this;
    }

    public IDocumentTagBuilder TaggedWith    (Action<ITagBuilder> buildUsing)
    {
        var tagBuilder = new TagBuilder(tag => 
            ProvideValueFor(document => document.Tags, tag));

        buildUsing(tagBuilder);
        return this;
    }
}



public interface ITagBuilder
{
    void Named(String name);
}

public class TagBuilder : ITagBuilder
{
    private readonly Action<Tag> _afterBuildAction;

    public TagBuilder(Action<Tag> afterBuildAction)
    {
        _afterBuildAction = afterBuildAction;
    }

    public void Named(String name)
    {
        var tag = new Tag(name);
        _afterBuildAction(tag);
    }
}

The expression builders all provide progressive interfaces. Also notice that the DocumentBuilder derives from a base class named Builder. This class provides a method ProvideValueFor that is used to feed the base class with the name of a property and a corresponding value. Collections are also supported. Here's the code for the Builder class.

public interface IBuilder<T>
{
    T Build();
}

public abstract class Builder<T> : IBuilder<T>
{
    private Dictionary<PropertyInfo, Object> PropertiesAndValues 
    { get; set; }

    protected Builder()
    {
        PropertiesAndValues = 
            new Dictionary<PropertyInfo, Object>();
    }

    public static implicit operator T(Builder<T> builder)
    {
        return builder.Build();
    }

    protected void ProvideValueFor(Expression<Func<T, Object>> expression, 
                                   Object value)
    {
        var property = ReflectionHelper.GetProperty(expression);
        
        if(false == PropertiesAndValues.ContainsKey(property))
            RegisterPropertyAndValue(property, value);
        else
            SetPropertyAndValue(property, value);
    }

    private void SetPropertyAndValue(PropertyInfo property, 
                                     Object value)
    {
        if(IsCollection(property))
        {
            var values = (List<Object>) PropertiesAndValues[property];
            values.Add(value);
        }
        else
        {
            PropertiesAndValues[property] = value;   
        }
    }
    
    private void RegisterPropertyAndValue(PropertyInfo property, 
                                          Object value)
    {
        if(IsCollection(property))
            PropertiesAndValues.Add(property, 
                                    new List<Object>() { value });          
        else
            PropertiesAndValues.Add(property, value);   
    }

    private static Boolean IsCollection(PropertyInfo property)
    {
        if(property.PropertyType == typeof(String))
            return false;

        var collectionType = typeof(IEnumerable<>);
        return IsCollectionOfType(collectionType, 
                                  property.PropertyType);
    }

    private static Boolean IsCollection(FieldInfo field)
    {
        var collectionType = typeof(ICollection<>);
        return IsCollectionOfType(collectionType, field.FieldType);
    }
    
    private static Boolean IsCollectionOfType(Type collectionType, 
                                              Type type)
    {
        if(collectionType.Name == type.Name)
            return true;

        var interfaces = type.GetInterfaces();
        return interfaces.Has(@interface => 
            @interface.Name == collectionType.Name);    
    }

    public T Build()
    {
        var typeToBuild = typeof(T);
        if(false == HasParameterlessConstructor(typeToBuild))
            throw new InvalidOperationException(
                "No parameterless constructor.");

        var instance = (T)Activator.CreateInstance(typeToBuild, true);
        foreach(var entry in PropertiesAndValues)
        {
            var property = entry.Key;
            if(IsCollection(property))
                SetCollectionValuesFor(property, instance, 
                                       (List<Object>) entry.Value);
            else
                SetValueFor(property, instance, entry.Value);
        }

        return instance;
    }

    private static Boolean HasParameterlessConstructor(Type type)
    {
        const BindingFlags bindingFlags = 
            BindingFlags.Public | 
            BindingFlags.NonPublic | 
            BindingFlags.Instance;
                                          
        var defaultConstructor = 
            type.GetConstructor(bindingFlags, null, 
                                new Type[0], null);
        return null != defaultConstructor;
    }

    private static void SetValueFor(PropertyInfo property, T instance, 
                                    Object value)
    {
        property.SetValue(instance, value, null);    
    }

    private static void SetCollectionValuesFor(PropertyInfo property, 
                                               T instance, 
                                               List<Object> values)
    {
        var backingField = BackingFieldResolver.GetBackingField(property);
        if(false == IsCollection(backingField))
        {
            var message = String.Format(
                ResourceLoader<Builder<T>>
                    .GetString("InvalidCollectionType"), property.Name);
                
            throw new InvalidOperationException(message);    
        }

        var collection = property.GetValue(instance, null);
        foreach(var value in values)
        {
            const BindingFlags bindingFlags = 
                BindingFlags.Public | 
                BindingFlags.Instance | 
                BindingFlags.InvokeMethod;
                                              
            backingField.FieldType
                .InvokeMember("Add", bindingFlags, null, 
                              collection, new[] { value });
        }
    }
}

Using this approach, its no longer necessary to make any compromise of exposing property setters or a dedicated constructor just for serving the expression builders. The Builder class uses refection to set the value of a property or to fill a collection. The BackingFieldResolver is a class I picked up from this post. Very cool stuff!

I have only used this approach in a couple of side projects, but let me know you think.

Till next time.

Monday, July 20, 2009

NHibernate 2.1 and Collection Event Listeners

In a previous post, I talked about cascading deletes being a new feature introduced by NHibernate 2.0. If you haven't heard about this before, then you'd probably be interested to read about it first.

Cascading deletes are all great if your database of choice supports CASCADE DELETE foreign key constraints. But what if it doesn't provide this feature or, as in my case, the database in question does support this feature but the DBA's don't want anything to do with it? In case of a parent domain object having a collection of many child objects, you still might want to have a one-shot delete capability instead of having separate DELETE statements for each child record.

The newly released NHibernate 2.1 (congratulations to the entire team for their efforts and hard work) comes to the rescue, which introduces a couple of new event listeners that deal with collections.

First we need an example. Suppose we are building an auction web site and the domain has a class called Item which in turn has a collection of Bids.

public class Item
{
    private ISet<Bid> _bids;    
    public Int64 Id { get; private set; }
    
    ...
}

public class Bid
{
    public Double Amount { get; private set; }
    public Sting Code { get; private set; }

    ...
}

The mapping for these classes looks something like this:

<class name="MyAuction.Item, MyAuction" 
       table="Item">
    <id name="Id" type="Int64" unsaved-value="-1">
        <column name="Id" sql-type="integer"/>
        <generator class="native"/>
    </id>
    
    ...
    
    <set name="Bids" 
         access="field.camelcase-underscore" 
         lazy="false" 
         cascade="all-delete-orphan" 
         inverse="true" 
         optimistic-lock="false">
        <key>
            <column name="ItemId"/>
        </key>
        <one-to-many class="MyAuction.Bid, MyAuction"/>
    </set>
</class>

<class name="MyAuction.Bid, MyAuction" 
       table="Bid">
    <composite-id>
        <key-property name="ItemId" 
                      column="ItemId" 
                      type="Int64"/>
        <key-property name="Code" 
                      column="Code" 
                      type="String"/>
    </composite-id>
    
    ...
    
</class>

Just to give you a general idea of the situation here. Now suppose we want to delete a quite popular Item object  which has a numerous amount of Bids. Because the collection of Bids is mapped as inverse, NHibernate will remove every record for a Bid with a separate DELETE statement for each row.

DELETE FROM Bid WHERE ItemId=@p0 AND Code=@p1 ;@p0 = 2, @p1 = 'F1001'
DELETE FROM Bid WHERE ItemId=@p0 AND Code=@p1 ;@p0 = 2, @p1 = 'F1002'
DELETE FROM Bid WHERE ItemId=@p0 AND Code=@p1 ;@p0 = 2, @p1 = 'F1003'
...
DELETE FROM Item WHERE Id = @p0; @p0 = 2

We could solve this by creating a collection event listener. The first thing we have to do is figure out how to issue a one-shot delete instead of those separate DELETE statements.

public interface IOneShotDeleteHandler
{   
    Type ForEntity();
    Type[] ForChildEntities();
    void GiveItAShot(ISession session, Object entity);
}

public class OneShotDeleteHandlerForItem : IOneShotDeleteHandler
{
    public Type ForEntity()
    {
        return typeof(Item);
    }

    public Type[] ForChildEntities()
    {
        return new[] { typeof(Bid) };
    }

    public void GiveItAShot(ISession session, Object entity)
    {
        var item = (Item)entity;

        session.CreateQuery("delete Bid where ItemId = :itemId")
            .SetInt64("itemId", item.Id)
            .ExecuteUpdate();
    }
}

We created an IOneShotDeleteHandler interface with one implementation for the Item class.  The most notable aspect of this implementation is the use of the HQL delete statement that removes all Bids for a particular Item.

Next step is to create a collection event listener that implements the IPreCollectionRemoveEventListener interface.

public interface IEventListener
{
    void ConfigureFor(Configuration configuration);
}

public class CollectionRemoveEventListener 
    : IPreCollectionRemoveEventListener,
      IEventListener
{
    private readonly IEnumerable<IOneShotDeleteHandler> 
        _oneShotDeleteHandlers;
    
    public CollectionRemoveEventListener(
        IEnumerable<IOneShotDeleteHandler> oneShotDeleteHandlers)
    {
        _oneShotDeleteHandlers = oneShotDeleteHandlers;
    }

    public void OnPreRemoveCollection(
        PreCollectionRemoveEvent @event)
    {
        var affectedOwner = @event.AffectedOwnerOrNull;
        if(null == affectedOwner)
            return;

        var oneShotDeleteHandler = 
            _oneShotDeleteHandlers.SingleOrDefault(handler =>
            handler.ForEntity() == affectedOwner.GetType());

        if(null == oneShotDeleteHandler)
            return;

        oneShotDeleteHandler
            .GiveItAShot(@event.Session, affectedOwner);
    }

    public void ConfigureFor(Configuration configuration)
    {
        configuration
            .SetListener(ListenerType.PreCollectionRemove, this);
    }
}

Don't worry about the IEventListener interface. Its just there for registering all NHibernate event listeners in an IoC container.By doing so, it enables us to inject a collection of IOneShotDeleteHandler objects into the constructor of our event listener. When the OnPreRemoveCollection method is called, we simply lookup whether there's a handler available for the type of entity that's going to be deleted and give it a shot at removing its child collection in one sweep.

Now we only have to register this event listener:

var eventListeners = _dependencyContainer
    .ResolveAll<IEventListener>();
eventListeners.ForEach(eventListener => eventListener
    .ConfigureFor(configuration));

Now, if we would use this 'as is', NHibernate will give us the following error:

Row was updated or deleted by another transaction (or unsaved-value mapping was incorrect)

This is due to the fact that the collection event listener does its work and deletes all Bids by executing the HQL statement, but NHibernate still tries to issue a DELETE statement for each Bid. This is the result of the all-delete-orphan cascading rule we imposed in the mapping. We could reduce it to save-update, but then no individual DELETE statements are executed when a singe Bid is removed from the collection. Now what?

Well, we could provide a regular delete event listener that allows individual DELETE statements for Bid entities as long as their parent Item is not removed.

public class DeleteEventListener : 
    DefaultDeleteEventListener,
    IEventListener
{
    private readonly IEnumerable<IOneShotDeleteHandler> 
        _oneShotDeleteHandlers;

    public DeleteEventListener(
        IEnumerable<IOneShotDeleteHandler> oneShotDeleteHandlers)
    {
        _oneShotDeleteHandlers = oneShotDeleteHandlers;
    }

    protected override void DeleteEntity(
        IEventSource session, object entity, 
        EntityEntry entityEntry, Boolean isCascadeDeleteEnabled, 
        IEntityPersister persister, ISet transientEntities)
    {
        var oneShotDeleteHandler = _oneShotDeleteHandlers
            .SingleOrDefault(handler =>
                handler.ForChildEntities()
                    .Contains(entity.GetType()));

        if(null == oneShotDeleteHandler ||
           !IsParentAlsoDeletedIn(
                session.PersistenceContext, 
                oneShotDeleteHandler.ForEntity())
            )
        {
            base.DeleteEntity(session, entity, entityEntry, 
                              isCascadeDeleteEnabled, persister, 
                              transientEntities);
            return;
        }

        CascadeBeforeDelete(session, persister, entity, 
                            entityEntry, transientEntities);
        CascadeAfterDelete(session, persister, entity, 
                           transientEntities);
    }

    public void ConfigureFor(Configuration configuration)
    {
        configuration.SetListener(ListenerType.Delete, this);
    }

    private static Boolean IsParentAlsoDeletedIn(
        IPersistenceContext persistenceContext, Type typeOfParent)
    {
        foreach(DictionaryEntry entry in 
                                    persistenceContext.EntityEntries)
        {
            if(typeOfParent != entry.Key.GetType())
                continue;
            
            var entityEntry = (EntityEntry)entry.Value;
            if(Status.Deleted == entityEntry.Status)
                return true;
        }

        return false;
    }
}

With both event listeners registered, deleting a single Bid results in a single DELETE statement as one would expect:

DELETE FROM Bid WHERE ItemId=@p0 AND Code=@p1 ;@p0 = 2, @p1 = 'F1002'

and  removing an entire Item now results in a one-shot delete for all Bids:

DELETE FROM Bid WHERE ItemId=@p0; @p0 = 2
DELETE FROM Item WHERE Id = @p0; @p0 = 2

Make sure you use this solution for one-shot deletes wisely and only if you have to. If you can use the CASCADE DELETE foreign key constraints, then by all means, this is the preferred option. If not, only resort to this kind of solution only if you must and that you can prove that its going to give you a tremendous performance benefit. Also take a look at the batching support that NHibernate provides (at the moment only SQL Server and Oracle are supported).

Till next time

Sunday, July 19, 2009

Design Documents in CouchDB and Validation

To start off, here are the links to my previous posts about CouchDB:

  1. Relaxing on the Couch(DB)
  2. Installing the Couch(DB)
  3. PUTting the Couch(DB) in Your Living Room
  4. GETting Documents From CouchDB
  5. DELETE Documents From CouchDB
  6. Adding Attachments to a Document in CouchDB
  7. Views into CouchDB

I briefly mentioned 'design documents' in my previous post as the way views are being stored in CouchDB. This is probably the most common use of design documents, but there's more. One thing that's interesting is the ability of performing validation. Now before anyone starts raving like a mad man, I'm not implying that all validation or, to make matters even worse, that business rules should now all be handled by CouchDB, let along they should all be written in JavaScript from now on. In fact, quite the contrary.

But one concern that comes to mind that might be interesting for its use is validating the structure of a document.

As you might have noticed from my previous posts, we always dealt with one type of object. But a hello-real-world application typically has a domain where multiple types of objects need to be persisted. With an RDBMS, we usually have separate tables to store these different kinds of objects. CouchDB on the other hand only has the notion of storing documents. Now suppose, we need to store both Customer objects and Order objects (every application needs those, right?). How would we deal with that in CouchDB?

Well, very simple. We'll just add a Type field to every document.

{
    "Type" : "Customer"
    "FirstName" : "Homer"
    "LastName" : "Simpson"
}

{
    "Type" : "Order"
    "Supplier" : "Duff"
    "Subject" : "Sweet, sweet beer" 
}

Just common sense. In order to get a list of all orders, we could provide the following map function:

function(doc) 
{
  if(doc.Type != "Order") 
    return true

  emit(null, doc);
}

Piece of cake. Now we can use the validation capabilities of CouchDB to ensure that every document that is stored contains a Type attribute. They way to handle this is to create a design document that contains an attribute named 'validate_doc_update' that specifies a validation function of the following signature:

function(newDoc, oldDoc, userCtx)
{}

Now in order to keep clear of documents that specify no Type attribute, we could write the following function:

function(newDoc, oldDoc, userCtx)
{
    if(!newDoc.Type)
    {
        throw(
        {
            "Error" : "Documents need a type around here."
        });
    }
}

Every time a document is saved or updated, CouchDB will call every validation function that is stored in a design document with the key 'validate_doc_update'. When every function passes, the document will be stored. Otherwise, CouchDB will return HTTP status 403 (Forbidden) with a JSON response that contains the error we specified in the validation function.

As you an see, CouchDB provides you with a nice and easy to use validation mechanism that can be useful in a couple of scenarios.

Till next time

Monday, July 13, 2009

Using NHibernate for Legacy Databases

One of the downsides of being confronted with a shared legacy database day in and day out is that you have to map your domain objects to database tables that are also used by other applications. A typical scenario in this case is that those database tables contain more columns than those that are required for your application. These extra columns are specifically there to serve those other legacy applications. Heck, to make matters even worse, there are probably some new columns added specifically for your application as well. This is the fairy tale of shared legacy databases.

Using NHibernate in these scenarios can be challenging sometimes but its built in flexibility and extensibility really helps you to deal with those cases. The problem I ran into last week was that we needed to store a domain object into a table that had a lot more columns than were actually required for our application. If it would be possible to store null values in these columns or if they had default values configured for them in the schema, then this would not be a problem. Instead, these unnecessary columns could not store null values and had no default values associated with them.

First option would be to make some changes to the schema of the table. Alas, no luck there because the other legacy applications that are using the same table would break. Now what?

We needed to insert the default values ourselves, but those columns are not known by NHibernate because they are not mapped to any members of the domain object. One way to solve this, is to pollute the domain object by adding private fields that are initialized to the required default values.

public class SomeDomainEntity
{
    // Legacy fields with no purpose for the domain but required 
    // by the database.
    private Int32 _legacyField1 = 2;
    private Boolean _legacyField2 = false;
    private String _legacyField3 = "";
}

This is probably the simplest option, but imposes a broken window as infrastructure concerns are bleeding into the domain this way. In other words, this is not a viable solution. Keeping the legacy stuff isolated as much as possible, NHibernate provides some ways to deal with this by providing an extensive extensibility model.

After some snooping around in the source code of NHibernate, the solution we chose for dealing with this issue is by creating a custom access strategy. The built in property access strategies are probably already well known, but its also possible to write your own access strategy by implementing the IPropertyAccessor interface.

public class SomeDomainObjectAccessor : IPropertyAccessor
{
    private IEnumerable<IGetter> _defaultValueGetters;
    
    public SomeDomainObjectAccessor()
    {
        _defaultValueGetters = new List<IGetter>()
        {
            { new DefaultValueGetter<Int32>("LegacyColumn1", 2) }
            { new DefaultValueGetter<Boolean>("LegacyColumn2", false) }
            { new DefaultValueGetter<String>("LegacyColumn3", 2) }
        }
    }

    public IGetter GetGetter(Type type, String propertyName)
    {
        return _defaultValueGetters        
            .Where(getter => getter.PropertyName == propertyName)        
            .SingleOrDefault();
    }
    
    public ISetter GetSetter(Type type, String propertyName)
    {
        return new NoopSetter();
    }

    public Boolean CanAccessTroughReflectionOptimizer
    {
        get { return true; }
    }
}

private class DefaultValueGetter<T> : IGetter
{
    private readonly String _propertyName;
    private T Value { get; set; }

    public DefaultValueGetter(String propertyName, T value)
    {
        _propertyName = propertyName;
        Value = value;
    }

    public Object Get(Object target)
    {
        return Value;
    }

    public Type ReturnType
    {
        get { return typeof(T); }
    }

    public String PropertyName
    {
        get { return _propertyName; }
    }

    public MethodInfo Method
    {
        get
        {
            var method = typeof(BasicPropertyAccessor)              
                .GetMethod("GetGetterOrNull",
                           BindingFlags.Static | BindingFlags.NonPublic);

            var result = (BasicPropertyAccessor.BasicGetter)method              
                .Invoke(null, new Object[] { GetType(), "Value" });

            return result.Method;
        }
    }

    public object GetForInsert(Object owner, IDictionary mergeMap,
                               ISessionImplementor session)
    {
        return Get(owner);
    }
}

private sealed class NoopSetter : ISetter
{
    public void Set(Object target, Object value)
    {}

    public String PropertyName
    {
        get { return null; }
    }

    public MethodInfo Method
    {
        get { return null; }
    }
}

This simply involves a getter for providing default values and a dummy setter as we're not interested in setting any values on the domain objects. The DefaultValueGetter class uses a trick so that we can keep using the reflection optimizer of NHibernate. This also seems to be necessary when using NHibernate Profiler.

Now we only have to provide some properties in the mapping of the domain object using our custom access strategy:

<property 
    name="LegacyColumn1"
    column="LegacyColumn1"
    not-null="true"
    type="Int32"
    access="SomeNamespace.SomeDomainObjectAccessor, SomeAssembly"/>
          
<property 
    name="LegacyColumn2"
    column="LegacyColumn2"
    not-null="true"
    type="Boolean"
    access="SomeNamespace.SomeDomainObjectAccessor, SomeAssembly"/>
          
<property 
    name="LegacyColumn3"
    column="LegacyColumn3"
    not-null="true"
    type="String"
    access="SomeNamespace.SomeDomainObjectAccessor, SomeAssembly"/>

This is probably not the best solution, but it does the job and prevents polluting the domain objects as a result of database quirks like these. I'm interested in hearing feedback or any better approaches.

Anyway, the easy extensibility of NHibernate makes it the best data access solution around. This way, one can deal with all edge case scenarios that weren't anticipated by the framework builders.

Till next time  

Friday, July 10, 2009

Views into CouchDB

To start off, here are the links to my previous posts about CouchDB:

  1. Relaxing on the Couch(DB)
  2. Installing the Couch(DB)
  3. PUTting the Couch(DB) in Your Living Room
  4. GETting Documents From CouchDB
  5. DELETE Documents From CouchDB
  6. Adding Attachments to a Document in CouchDB

So far, we've mostly talked about managing documents in CouchDB. Now I want to discuss another important concept of CouchDB, namely views.

Views are the primary means for querying and searching documents that are stored by CouchDB. As mentioned in one of my previous posts, CouchDB doesn't support SQL for querying documents. Consequently, views are to CouchDB as SQL is to an RDBMS. They are defined as MapReduce functions using JavaScript. When you've never heard about MapReduce, then take a look at the Introduction to MapReduce for .NET Developers for a quick dip into the concepts behind MapReduce.

CouchDB supports two kinds of views, permanent views and temporary views. A permanent view is stored as a special kind of document between the other regular documents. These special kind of documents are called 'design documents'. A permanent view can be executed by performing a GET operation with the name of the view.  A temporary view, as its name implies, is not stored by CouchDB. Instead, the code for the view is posted to CouchDB where it is executed once.

For me, permanent views are the most interesting so we will use this option for the example in this post. Recall that in the examples used in previous posts we've had documents like the following:

{
"_id":"96f49e5a-6b5b-47ed-9234-9a98d600013e",
"_rev":"2-1534297415",
"Author":"Stephen Hawking",
"Title":"The Universe in a Nutshell",
"Tags":[{"Name":"Physics"},{"Name":"Universe"}]
}

Lets create a view that we can use for retrieving the title of all documents for a particular  tag. For creating a permanent view, there are again two options: using Futon (the web-based user interface of CouchDB) or through the HTTP View API. For keeping things simple, let's use Futon for creating our view in CouchDB.

 image

When creating a new view, CouchDB provides a map function with a default implementation and an optional reduce function. The purpose of a map function is to perform a number of computations using arbitrary JavaScript and to emit key/value pairs into the view.  If the view also has a reduce function, then its used for aggregating the results. We'll ignore the reduce function for now and focus our attention to the map function. 

The basic anatomy of the map function as provided by Futon looks like this:

function(doc) {
  emit(null, doc);
}

As already mentioned, the emit function takes care of inserting a key/value pair of your own choosing into the view. For our example, we'll emit the name of a tag as the key and the title of a document as the value.

function(doc) 
{
  for each(var tag in doc.Tags)
  {
    emit(tag.Name, doc.Title);
  }
}

image

Using Futon it's possible to code up a map and reduce function and try it out on the documents that are stored in CouchDB. Executing the map function as is using an HTTP GET operation yields the following results:

GET /documentstore/_design/documents_by_tag/
_view/documents_by_tag HTTP/1.1

{"total_rows":5,"offset":0,"rows":[
{
 "id":"0afc1fc2-7b39-461f-87cf-1ed1e21d2f34",
 "key":"Universe","value":"The Universe in a Nutshell"
},
{
 "id":"7b287e5d-c467-46ad-a10e-66d3c0696743",
 "key":"Universe","value":"The Theory of Everything"
},
{
 "id":"0afc1fc2-7b39-461f-87cf-1ed1e21d2f34",
 "key":"Physics","value":"The Universe in a Nutshell"},
{
 "id":"7b287e5d-c467-46ad-a10e-66d3c0696743",
 "key":"Physics","value":"The Theory of Everything"},
{
 "id":"0afc1fc2-7b39-461f-87cf-1ed1e21d2f34",
 "key":"Space","value":"The Universe in a Nutshell"}
]}

We call the view by using its name in the URL. The result is a key for every tag we encounter and the title of the document as value. CouchDB also provides the document identifier for each key/value pair. This way we can see that we have five tags for two documents.

Now in order to get all distinct document titles for a particular tag, we have to add the tag name as an extra query parameter called 'key':

GET /documentstore/_design/documents_by_tag/
_view/documents_by_tag?key=%22Universe%22 HTTP/1.1

This yields the following result from our view:

{"total_rows":5,"offset":0,"rows":[
{
 "id":"0afc1fc2-7b39-461f-87cf-1ed1e21d2f34",
 "key":"Universe",
 "value":"The Universe in a Nutshell"},
{
 "id":"7b287e5d-c467-46ad-a10e-66d3c0696743",
 "key":"Universe",
 "value":"The Theory of Everything"}
]}

This way we can also use different tag names and reuse our view. Notice that the total_rows attribute indicates that the original result set from our view contains five objects.

Doesn't seem to be very hard now doesn't it? Please do mind that I've barely scratched the surface here. For more information, you can take a look at the CouchDB wiki or check out these forthcoming books:

If you're interested in distributed, non-relational databases in general you might want to check out the recordings from the NOSQL meetup that also hosted a talk on CouchDB as well.

Till next time

Adding Attachments to a Document in CouchDB

To start off, here are the links to my previous posts about CouchDB:

  1. Relaxing on the Couch(DB)
  2. Installing the Couch(DB)
  3. PUTting the Couch(DB) in Your Living Room
  4. GETting Documents From CouchDB
  5. DELETE Documents From CouchDB

Today, I want to talk about how to create attachments for a document. Documents in CouchDB can have attachments just like an email. CouchDB has two ways for dealing with attachments:

  1. Inline Attachments
  2. Standalone Attachments

Inline Attachments

Inline attachments can be added to a document by using the dedicated _attachments attribute while PUTting the document into CouchDB.

PUT /documentstore/4f754d4b-540d-4b77-8507-6b7243ef8325 HTTP/1.1

{
"Author":"Stephen Hawking",
"Title":"The Universe in a Nutshell",
"Tags":[{"Name":"Physics"},{"Name":"Universe"}],
"_attachments":
{
  "The Universe in a Nutshell.pdf": 
  {
    "content_type": "application/pdf",
    "data": "JVBERi0xLjUNCiW1tbW1DQox ... "
  }
}}

For creating an attachment, we need to provide a file name, the MIME type and the base64 encoded binary data. Its even possible to have multiple attachments for a single document.

Standalone Attachments

Standalone attachments are a fairly recent feature of CouchDB that has been added to version 0.9. As it name implies, it involves adding, updating and removing attachments without the document itself being involved. 

PUT documentstore/4f754d4b-540d-4b77-8507-6b7243ef8325/
The%20Universe%20in%20a%20Nutshell.pdf?rev=1-1437623276 HTTP/1.0
Content-Length: 911240
Content-Type: application/pdf

JVBERi0xLjUNCiW1tbW1DQox ...

The major difference and advantage of this approach is that the binary data is sent directly to CouchDB without the need for a base64 conversion on both the client and server. This implies a significant performance improvement when storing attachments in CouchDB. Notice that you still need to provide a MIME type using the Content-Type header.

GETting Documents with Attachments

When retrieving a document with either an inline or standalone attachment, the actual binary data is not returned. Instead, CouchDB returns a stub to inform that the requested document has an attachment associated with it. 

GET /documentstore/4f754d4b-540d-4b77-8507-6b7243ef8325 HTTP/1.1

{
"_id":"4f754d4b-540d-4b77-8507-6b7243ef8325",
"_rev":"1-1969924333",
"Author":"Stephen Hawking",
"Title":"The Universe in a Nutshell",
"Tags":[{"Name":"Physics"},{"Name":"Universe"}],
"_attachments":
{
    "The Universe in a Nutshell.pdf":
    {
        "stub":true,
        "content_type":"application/pdf",
        "length":911240}
    }
}}

In order to retrieve the binary data of the attachment itself (yes please!), we have to issue a second GET but now using both the document identifier and the file name of the attachment.

GET /documentstore/4f754d4b-540d-4b77-8507-6b7243ef8325/
The%20Universe%20in%20a%20Nutshell.pdf HTTP/1.1

CouchDB responds to this request by returning the binary data of the attachment. When an inline attachment is used, the binary data automatically gets decoded.   

For my next post, I will talk about MapReduce functions providing a simple example of a Map function in particular. 

Till next time.

Wednesday, July 08, 2009

The Europe Virtual ALT.NET Blog

Colin has set up a dedicated blog for the Europe Virtual ALT.NET gatherings. We'll be posting all announcements, details of recordings and any related stuff to this blog.

If you're disgusted from your RSS reader, then maybe you'll want to follow up on the announcements and other stuff via Twitter? We've created an account there as well, so if you'd like you can follow here.

Saturday, July 04, 2009

DELETE Documents from CouchDB

To start off, here are the links to my previous posts about CouchDB:

  1. Relaxing on the Couch(DB)
  2. Installing the Couch(DB)
  3. PUTting the Couch(DB) in Your Living Room
  4. GETting Documents From CouchDB

Today, I want to talk about how to delete a document from CouchDB. In order to do that, we have to use the HTTP DELETE operation (how convenient). Removing a document from CouchDB can be done using the following request:

DELETE /documentstore/13ce4780-62c8-4074-9955-8c99966b84bb
?rev=1-2901013762

This makes CouchDB return the following response:

{
"ok":true,
"id":"13ce4780-62c8-4074-9955-8c99966b84bb",
"rev":"2-3500205450"
}

This confirms that the document has been removed by CouchDB. But is it really gone? Not exactly. We might be able to retrieve it using its revision number. Sending the following request:

GET /documentstore/13ce4780-62c8-4074-9955-8c99966b84bb
?rev=2-3500205450

still indicates that the document we just deleted still exists:

{
"_id":"13ce4780-62c8-4074-9955-8c99966b84bb",
"_rev":"2-3500205450",
"_deleted":true
}

Notice that the _deleted attribute indicates that the document has indeed been deleted. It's even possible to resurrect the document at hand by performing an update, bringing the document back amongst the living documents.

However, I don't consider that a good idea in all scenarios. In fact, as pointed out by this comment on my previous post, one should not blindly rely on the MVCC tokens for version control of documents, something that I neglected to point out in that post. When CouchDB is configured for compaction or replication with other instances of CouchDB, then this approach is not recommended. In case of compaction , CouchDB will actively purge old versions of documents and deleted documents. In case of replication, a particular node in a clustered environment doesn't necessarily have the complete version history of a document. Bottom line, use this feature very carefully.

For my next post, I will talk about how to create attachment for a document.

Till next time

Thursday, July 02, 2009

GETting Documents From CouchDB

To start off, here are the links to my previous posts about CouchDB:

  1. Relaxing on the Couch(DB)
  2. Installing the Couch(DB)
  3. PUTting the Couch(DB) in Your Living Room

In my latest post, I explained how easy it is to create and manage documents in CouchDB. Today, I want to talk about how to retrieve a document from CouchDB.

In order to retrieve a document from CouchDB, we make use of the HTTP GET operation (duh). Using the GET method for retrieving a document takes the following general form:

http://myhost:5984/my_database/my_id

Its as simple as that. So, for example, sending the following request:

GET /documentstore/96f49e5a-6b5b-47ed-9234-9a98d600013e

makes CouchDB return the following response:

{
"_id":"96f49e5a-6b5b-47ed-9234-9a98d600013e",
"_rev":"2-1534297415",
"Author":"Stephen Hawking",
"Title":"The Universe in a Nutshell",
"Tags":[{"Name":"Physics"},{"Name":"Universe"},{"Name":"Space"}]
}

As already mentioned, CouchDB provides MVCC (multi-version concurrency control) for the documents it stores. Using the GET operation makes it possible to take advantage of this built-in feature. Notice that the first digit of the revision number in the example above indicates that this is the second version of the document we retrieved. This means that CouchDB possibly has an earlier version of the document. In order to find out what revisions are available, we can issue the following request:

GET /documentstore/96f49e5a-6b5b-47ed-9234-9a98d600013e
?revs=true
CouchDB returns the current revision of the document as before, but now with an additional field named _revisions:
{
"_id":"96f49e5a-6b5b-47ed-9234-9a98d600013e",
"_rev":"2-1534297415",
"Author":"Stephen Hawking",
"Title":"The Universe in a Nutshell",
"Tags":[{"Name":"Physics"},{"Name":"Universe"},{"Name":"Space"}],
"_revisions":{"start":2,"ids":["1534297415","3158114761"]},
}

In order to retrieve the first revision of our document, we can send the following request to retrieve it:

GET /documentstore/96f49e5a-6b5b-47ed-9234-9a98d600013e
?rev=1-3158114761

As expected, CouchDB now returns the first revision of the document:

{
"_id":"96f49e5a-6b5b-47ed-9234-9a98d600013e",
"_rev":"1-3158114761",
"Author":"Stephen Hawking",
"Title":"The Universe in a Nutshell",
"Tags":[{"Name":"Physics"},{"Name":"Universe"}]
}

It seems that for the second revision of the document, we have added an extra tag. Having this kind of automatic versioning for your data is a really nice feature in my book.

For my next post, I will talk about how to delete a document from CouchDB..

Till next time

PUTting the Couch(DB) in Your Living Room

In my previous posts, I provided a shallow introduction to CouchDB and how to get it installed on a Linux box. Here are the links to these posts:

  1. Relaxing on the Couch(DB)
  2. Installing the Couch(DB)

Now, for this post I want to talk about how to create and manage documents. As I already mentioned in my introductory blog post, CouchDB is accessible through a RESTful HTTP/JSON API. This means that documents are stored as JSON objects. Let me show you an example of how to store a document in CouchDB. 

Suppose we have a class named Document (how creative of me). A Document has a title, an author and a collection of Tag objects. A Tag is a value object that has a name.

public class Document
{
    private readonly ISet<Tag> _tags;

    public String Author { get; }
    public String Title { get; }
    public Guid Id { get; }
    public String Version { get; }

    public IEnumerable<Tag> Tags
    {
        get { return _tags; }
    }
}

public class Tag : ValueObject<Tag>
{
    public String Name { get; private set; }

    public Tag(String name)
    {    
        Name = name;
        RegisterProperty(value => value.Name);
    }
} 

Nothing fancy so far. Now in order to save an instance of this class, we have to serialize it to its JSON representation. I've been using Json.NET for this purpose, but you might as well use the JavaScriptSerializer class from the .NET framework.

Now, in order to save a document, we make use of the PUT HTTP method. This implies that we have to provide our own document identifier. We have to initialize the Id property with a new GUID for a new Document. We could also make use the POST HTTP method. This way you don't have to specify your own document identifier and let CouchDB generate one for you when the document is initially stored. But this is discouraged because proxies and other network intermediaries are able to resend POST requests resulting in duplicate documents.

Creating a new document in CouchDB using the PUT method takes the following general form:

http://myhost:5984/my_database/my_id

So for our example, sending the following request:

PUT /documentstore/80c8675d-2015-45f5-a3c7-098226e15ce3 HTTP/1.1
Content-Type: application/json
Host: 146.135.16.100:5984
Content-Length: 1215256
Expect: 100-continue

{
"Author":"Stephen Hawking",
"Title":"The Universe in a Nutshell",
"Tags":[{"Name":"Physics"},{"Name":"Universe"}]
}

makes CouchDB return the following response:

{
"ok":true,
"id":"80c8675d-2015-45f5-a3c7-098226e15ce3",
"rev":"1-1437623276"
}

You probably figured out by now that this response means that the document has successfully been stored by CouchDB. Notice that it also returns a revision number that was generated upon creating the document in the database. This is important when we try to save changes to the document later on.

If you're as suspicious as I am, you probably want to verify whether CouchDB  correctly stored the document we've provided. You can do this using Futon, the web-based user interface of CouchDB. You can access Futon through the following URL:

http://myhost:5984/_utils

Say that we already want to make some changes to the document we just created. Lets add a new tag to the document named 'Space'. We add a new Tag object to the instance of our Document, we serialize it back to its JSON representation and send the following request, asking CouchDB to save the changes:

PUT /documentstore/80c8675d-2015-45f5-a3c7-098226e15ce3 HTTP/1.1
Content-Type: application/json
Host: 146.135.16.100:5984
Content-Length: 1705803
Expect: 100-continue

{
"_rev":"1-1437623276",
"Author":"Stephen Hawking",
"Title":"The Universe in a Nutshell",
"Tags":[{"Name":"Physics"},{"Name":"Universe"},{"Name":"Space"}]
}

Notice that we are using the PUT method again, but now we also provided the revision number we got back from our first response when we initially created the document. Also notice that we are sending the complete document again, but now with the extra tag.

CouchDB returns the following response that contains a new revision number for the document:

{
"ok":true,
"id":"80c8675d-2015-45f5-a3c7-098226e15ce3",
"rev":"2-3213552600"
}

Again, you can use Futon to verify if CouchDB stored a new version of the document. For my next post on CouchDB, I will talk about how retrieve a document from CouchDB.