Monday, June 2, 2014

If OO languages don't need variants, they don't need objects either

The usual story is that object-oriented languages don't need sum types (a.k.a. variants) because they can use the Visitor pattern instead. For example, this Haskell type:

data Either a b = Left a | Right b

can be converted to a couple of Java interfaces:

interface EitherVisitor<A, B, Result> {
   Result visitLeft(A a);
   Result visitRight(B b);
}

interface Either<A, B> {
  <Result> Result accept(EitherVisitor<A, B, Result> visitor);
}

(Note that I'm using logji's corrected Visitor pattern, which is generic over the Result type. That's the right way to implement visitors and should be in every textbook.)

I think the above argument is right, and visitors can be used to replace sum types in most situations, but I also think it proves too much. Namely, you can make a similar argument to prove that object-oriented languages don't need product types (a.k.a. objects with fields) either! For example, this Haskell type:

data Pair a b = Pair a b

can be expressed with Java interfaces that contain only one method each (a.k.a. function types):

interface PairVisitor<A, B, Result> {
  Result visit(A a, B b);
}

interface Pair<A, B> {
  <Result> Result accept(PairVisitor<A, B, Result> visitor);
}

(With currying you could go even further, and obtain interfaces that contain a single method receiving a single argument each. I won't do it here because it makes the code less clear.)

More generally, if you have only function types, you can use Church encoding to simulate pairs, lists, numbers and just about everything else. The resulting code will have reasonable types in System F, which is a subset of many existing type systems, including Java's and Haskell's.

There is, however, a subtlety. The Java interfaces shown above don't actually contain any data of types A and B, they just have methods to return that data on demand. In other words, they correspond to lazy Pair and Either. That has a couple of nontrivial consequences.

First, the thing that promises to be a Pair or Either can go into an infinite loop or throw an unchecked exception when asked to return the data. That corresponds to Haskell's bottom value, which is an inhabitant of every type.

Second, lazy types can lead to infinite data structures. For example, this Haskell type:

data List a = Empty | Append a (List a)

can be expressed with Java interfaces like this:

interface ListVisitor<A, Result> {
  Result visitEmpty();
  Result visitAppend(A a, List<A> tail);
}

interface List<A> {
  <Result> Result accept(ListVisitor<A, Result> visitor);
}

If we encode lists in this way, there's nothing stopping a list from passing itself as its own tail into visitAppend. That means the type system cannot guarantee that the list is finite, as it can in eager functional languages like ML. Robert Harper has a well-known post arguing against lazy data structures, but Haskell programmers seem to like them.

On the other hand, using interfaces for sum and product types leads to an unexpected benefit for Java programmers, because they can write their own implementations that will be compatible with existing libraries. If we had a language where all types are interfaces "under the hood", we could do nice things like replacing fields with getter methods transparently, though the current Java style seems to be too clunky for that.

I guess this post turned out a little more rambling and educational than I intended. Oh well. A short summary is that both sum types and product types can be viewed as a layer of syntactic sugar over function types, and peeking below that layer can have some benefits for understanding programs.

No comments:

Post a Comment