Saturday, September 27, 2008

Properties - A False Sense of Encapsulation

I'm having more and more mixed feelings about properties in .NET, or accessor methods in general, as properties compile to get_xxx/set_xxx methods behind the covers. I've never thought about it that much, but now they seem to have become a warning flag of bad design.

Everyone knows by now that public fields are evil. I'm not going to elaborate on that anymore. However, I've come to believe that properties are not the answer either. When a class has a particular number of private fields, then the general malpractice is to also include corresponding properties for these private fields. Whether these properties are read-only or include both getters and setters is not important. The main point is that this habit should be questioned. To me, the use of properties should be applied with great care.

But first, let me make a clear distinction between objects and data structures. An object hides its data and exposes methods to operate on that data. These methods provide the behavior of an object. A data structure, also known as a Data Transfer Object (DTO), only exposes its data and has no methods. They can be seen as data carriers that have no behavior whatsoever. Objects are best used in domain models, while DTO's, as their names imply, are best used for transferring data from one layer to another.

Why this distinction? For starters, because I know a lot of people who talk about DTO's as if they were full blown objects. They use DTO's as their primary domain inhabitants which leads to the Anemic Domain Model anti-pattern. And secondly, because the use of properties mentioned in this post obviously applies to objects and not DTO's.

As I already stated in the definition above, objects hide their data. Relentlessly exposing this data through properties breaks encapsulation of these objects. You might say that this isn't the case because properties 'hide' the private member variables of the class. But properties are just another intricate way of exposing private data.

Let's go through some examples of bad code, shall we.

Inconsistent objects

Suppose we have a class named Company in our domain. Something we see very often is something like the following:

public class Company { private String _city; private String _country; public String City { get { return _city; } set { _city = value; } } public String Country { get { return _country; } set { _country = value; } } }

Suppose we have the requirement that the user of our system can change the location of our company in case it spontaneously settles itself in a different country. The calling code would look something like this:

company.City = "Erps-Kwerps"; company.Country = "Belgium";

As you can see, this operation is clearly not atomic. It allows you to have inconsistent objects. Suppose the location was previously Hong Kong, China. The object becomes inconsistent after having the City property modified to 'Erps-Kwerps', which definitely a long way from China.

The better approach would be the following:

public class Company { private String _city; private String _country; public void ChangeLocationTo(String city, String country) { _city = city; _country = country; } }

Removing the properties guarantees encapsulation. Providing the ChangeLocationTo method not only makes the requirements explicit in code, but it also ensures that the operation is performed atomically.

The calling code gets really easy now:

company.ChangeLocation("Erps-Kwerps", "Belgium");

The most important thing here is that code that belongs to the Company class has now moved to where it belongs.

Procedural code

This is probably the anti-pattern I see he most: procedural programming in an OO language aka Cobol in C#. Behold the following application service:

public class BeerService { public void OrderDrink(Int64 customerId) { Customer customer = Repository<Customer>.Get(customerId); if(customer.Age > 16) { // Cheers! } } } public class Customer { private Int32 _age; public Int32 Age { get { return _age; } set { _age = value; } } }

This very simple example is quite obvious. The logic in the application service extracts the data from the object. The OO way would be to put the logic where the data is, namely the Customer class:

public class Customer { private Int32 _age; public Boolean IsOlderThanMinimumAge() { return customer.Age > 16; } public Boolean CanHaveBeer { if(IsOlderThanMinimumAge()) { return true; } return false; } }

See, no more properties and most importantly, we now adhere to Tell, Don't Ask principle as described in The Pragmatic Programmer. We don't ask for the data, we tell the object that has the data to do the work for us.

Hello Real World

I know these examples are very simplistic and that real-life isn't always as shiny. There are a lot of cases where adding properties is the right thing to do. The first thing that comes to mind are unit tests on domain objects. Suppose I write a unit test that calls a particular method on an object. The unit test probably wants to verify the state (data) of the domain object. I've seen solutions from some developers who let the domain object handle the assertions, which implies that the assemblies of the unit test framework of choice would be deployed in the production environment.

Another problematic scenario would be a mapping class, which is responsible for mapping the data of a domain object to a DTO. One could argue that this could be done by the domain object itself, but what if there are multiple DTO's (different views) for the same domain object. Also smells like SRP violation to me. Another option would be to use reflection.

A while ago, I was exploring the code of the TimeAndMoney DDD example (yes, Java code). While I was reading the code of the Money class, I found this intention-revealing approach for adding getters to a class: (Mayday, Mayday, Java code coming up ;-) )

/** * How best to handle access to the internals? It is needed for * database mapping, UI presentation, and perhaps a few other * uses. Yet giving public access invites people to do the * real work of the Money object elsewhere. * Here is an experimental approach, giving access with a * warning label of sorts. Let us know how you like it. */ public BigDecimal breachEncapsulationOfAmount() { return amount; } public Currency breachEncapsulationOfCurrency() { return currency; }

Maybe role interfaces described by Martin Fowler and popularized by Udi Dahan can provide a clean solution here. A domain object that implements two types of interfaces:

  • interfaces containing nothing but property getters (query).
  • interfaces containing a single method (command).

If you, my dear reader, have any thoughts about this, I'm glad to hear them.

Conclusion

As always, there are two camps here: on the left are the people who don't believe that properties violate encapsulation and on the right are the people who despise properties and never ever use them in their code. I'm currently sitting somewhere in between (I keep seeing gray ;-)). I do believe that we should quit with the habit of providing properties for private member variables by default. We should only provide access to private data when there is no other clean alternative. As always, being dogmatic is not a good thing. Reducing visibility of data as much as possible leads to more reliable and maintainable code. We should beware of code that calls more than a single method on the same object. Properties are not evil, but in a lot of cases inappropriate. We should be able to resist to this temptation.

To round of this long post, let me just point you to an article I found on Java World (yes, again Java) written in 2003 by Allen Holub, named Why getter and setter methods are evil (don't ask how I found it) which explains the case I'm trying to make with this post in a much clearer way.

Update 09/27/2008: It seems that this topic is also being discussed on the ALT.NET user group.

2 comments:

Sven said...

Hello Jan,

Nice post you made here and I think you make a valid point. Makes me think about some recent code I wrote.
I was just thinking: Is it a good idea to let the object that has the information decide what to do?
I can imagine some situations where I wouldn't like the object itself to decide, but an external source, that also takes in some context information. So I would say that an object can decide for itself when it has all the information it needs.
A subtle difference I think, but it could lead us to another interesting topic: coupling :)

Jan Van Ryswyck said...

Good point and I'm glad you mention this. There are cases where a seperate object is more appropriate. In DDD, Eric Evans calls this a domain service. It's an integral part of the domain model (e.g. transferring an amount of money between account objects).

There are no golden rules here. It mostly boils down to some gut feeling. Getting your feet wet and making mistakes is probably the best tutor. Every example I gave in this post are also mistakes I've made myself.