The Essence of a List

I want to talk about the cost of rich public interfaces. By this I mean classes with a lot of public methods. One design guideline is to put the code where the data is. I think this leads to a lot of unnecessary complexity and I'll explain why below.

I'll use Java lists as my main example. I don't want to change the library, or even write an alternative library but you (the reader) and I are both familiar with lists so they are a convenient vehicle for my message. I'm really talking about problems I've noticed in my own code.

A New List

Let's say you have an awesome new idea for a List implementation in Java. How do you code it up? Maybe you start with something that looks like this:

import java.util.*;

public class AwesomeList<E> implements List<E> {

}

That's a good start. Now all that is left to do is implement the 23 methods from the List interface. That seems like quite a lot but most of them are simple. You breeze through get() and set() and size() and isEmpty() and all the rest of the one-liners. Then you do addAll(), contains(), containsAll(), retainAll() and all the rest that require a loop. Then you get your IDE to generate equals() and you copy and paste hashCode() because that's what the Javadoc says to do.

That's a lot of boring boilerplate but this is Java so you plough through. All that's left is iterator().

Now, I don't know about you but whenever I've implemented Iterable, I've taken the easy way out. I've copied all the objects from my class into a new list and returned that list's iterator. Maybe I'm a bad person but it's 4 quick lines of code and I've yet to see it cause a performance problem. However, this time the whole point is to program a bespoke list so you should probably write a proper ListIterator for the job.

Implementing Just the Essence

This is crazy. Nobody wants to do all that work and the designers of the Java Collections Framework knew it. Instead of all that, you can just extend AbstractList. Then all you have to implement is:

2 methods for an immutable list: get(), size()
3 methods for a mutable list: get(), size(), set()
5 methods for a variable size list: get(), size(), set(), add(int, E), remove()

Look at that! 2-5 methods!

The essence of a list in Java is 2-5 methods. They are the only essential ones. All the others can be described in terms of this small handful. In fact, there is no technical reason to include them as part of the List interface. They could all be turned into static utility methods and moved over to the Collections class. After all, the difference between the following two methods is mainly a philosophical one:

public boolean addAll(Collection<? extends E> c) {
	boolean modified = false;
	for (E e : c)
		if (add(e))
			modified = true;
	return modified;
}

public static <E> boolean addAll(
	Collection<? extends E> source, 
	Collection<E> dest)
{
    boolean modified = false;
    for (E e : source)
        if (dest.add(e))
            modified = true;
    return modified;
}

Pros and Cons

Is it really just a philosophical issue or is one better than the other? Let's consider what would happen if we really could move all these List methods to the Collections class as static utility methods.

First of all, the AbstractList class would be empty. So would AbstractCollection, and if you extended the process so would AbstractSequentialList, AbstractQueue, AbstractSet, AbstractMap and AbstractDeque. These 6 classes (maybe most abstract classes) would disappear altogether. The inheritance hierarchy for most collections would become one level deep and all of the current functionality would remain. The interfaces themselves would become very simple.

So the upside is a much simpler system. What's the downside? Well, it seems like it would break polymorphism. Organising your code this way would make it awkward to override the methods for specific types of list. You could still do it but you would have to remember to call the correct one.

Polymorphism in Java is often used for dependency inversion. This is still possible with static methods. You can pass them as lambdas. There are pros and cons to this approach when compared to passing an object that already has all the methods that could possibly be required. It's trivial to write decorators or adapters for individual methods but objects are more convenient when several methods need to be injected.

Wrapping Up

While adding methods to public interfaces is sometimes the correct solution, I think that maybe it shouldn't be the default. Large interfaces are tedious to implement which can be a barrier to providing alternative implementations. If an interface only contains the essential elements then it's easier to implement and communicates better what that the object is for.

There are several alternative places to put methods. They can be static methods of the interface or a utility class. They can be default methods or even part of mixin-style interfaces. Inspiration can be taken from other languages. C# has extension methods and Ruby has mixins. Perhaps the ultimate inspiration comes from function languages. Instead of objects, you have a data structure and a set of functions that operate on it.

09 August 2015

The Essence of a List

A New List

Implementing Just the Essence

Pros and Cons

Wrapping Up

Comments