3 Things Java Programmers Can Steal from Clojure

The other day I wrote about some principles that programming in Clojure makes very clear. Those principle could be applied just as well in Java, and often are. However, there are some things that make Clojure distinct.

Three of those distinctions are the way it deals with state change (using an STM), the Persistent Data Structures, and the literal syntax for data with a reader (now called edn). Diving into the source code for Clojure, I realized that these three bits were written in Java. And that means that they can be used from Java. It is certainly not as easy as using them in Clojure, but they are all three powerful enough to warrant using them if you are using Java. You simply need to add one more JAR to your project (or add a maven dependency). I have constructed a few minimal examples of their use.

1. Persistent Data Structures

Clojure comes with several powerful and fast collection classes. The interesting thing about them is that they are immutable. If you want to add an object to a list, you actually create a new list containing the old elements and the new element. Instead of using copy-on-write, it reuses most of the internal structure of the original list, so only a small number of objects need to be allocated. It turns out that this can be done very quickly, comparable to using an ArrayList.

The following example illustrates three of the more useful data structures: Vector, HashMap, and HashSet.

package persistent;

import clojure.lang.IPersistentMap;
import clojure.lang.IPersistentSet;
import clojure.lang.IPersistentVector;
import clojure.lang.PersistentHashMap;
import clojure.lang.PersistentHashSet;
import clojure.lang.PersistentVector;

public class PersistentTest {
  public static void main(String[] args) {
    IPersistentMap m = PersistentHashMap.create("abc", "xyz");
    m = m.assoc(1, 4); // add a new key/value pair
    m = m.assoc("key", "value");
    m = m.without("abc"); // remove key "abc" 
    System.out.println(m);

    IPersistentVector v = PersistentVector.create(1, 2, 3);
    v = v.assocN(0, "a string"); // change index 0
    v = v.cons("should be last"); // add a string at the end
    System.out.println(v);

    IPersistentSet s = PersistentHashSet.create("a", "b", "c");
    s = (IPersistentSet) s.cons("d"); // add d to the set
    s = (IPersistentSet) ((IPersistentMap) s).without("a"); // remove an element
    s.contains("g"); // should return false
    System.out.println(s);
  }
}

Now, it ain't pretty. But it's actually no worse than quite a few native Java libraries I've seen. There may be a better way to do this, but this one works.

2. Software Transactional Memory

Clojure uses Multiversion concurrency control to provide a safe way to manage concurrent access to state shared between threads. In Clojure, they are called refs. I won't go very deep into how it works. Suffice it to say that Clojure refs gives you non-blocking reads and transactional updates without having to do locking yourself. There are two caveats: 1 is that the value you give to the ref has to be immutable. 2 is that you should not perform IO (or perform any mutation) inside of the transaction.

package stm;

import java.util.concurrent.Callable;

import clojure.lang.LockingTransaction;
import clojure.lang.Ref;

public class STMTest {
  public static void main(String[] args) {
    // final needed to be used in anonymous class
    final Ref r = new Ref(1);
    final Ref s = new Ref(5);

    try {
      // run this in a transaction
      // don't do IO inside
      LockingTransaction.runInTransaction(
        new Callable<Object>() {
          public Object call(){
            s.set((Integer)r.deref() + 10);
            r.set(2);
            return null;
          }
        }
      );
    } catch (Exception e) {
      e.printStackTrace();
    }

    System.out.println(r.deref());
    System.out.println(s.deref());
  }
}

3. Extensible Data Notation

With Clojure 1.5, edn has become a standard part of the language. Edn is like an extensible JSON where the keys of objects can be any value (not just strings). It is based on the Clojure literal syntax, much in the same way that JSON is based on Javascript literal syntax. It is a nice way to serialize data. And since you already have the JAR in your project, it's a no brainer to use it.

package edn;

import java.io.PushbackReader;
import java.io.StringReader;

import clojure.lang.EdnReader;
import clojure.lang.PersistentHashMap;

public class EDNTest {
  public static void main(String[] args) {
    // reading from a string
    System.out.println(
      EdnReader.readString("{\"x\" 1 \"y\" 2}", PersistentHashMap.EMPTY));

    // reading from a Reader
    // really, you can use any Reader wrapped in a PushbackReader
    System.out.println(
      EdnReader.read(new PushbackReader(new StringReader("#{10 2 3}")),
                     PersistentHashMap.EMPTY));
  }
}

You may be interested in my Weekly Clojure Newsletter.

4 Things Java Programmers Can Learn from Clojure (without learning Clojure)

I was trained in Java at University. The OOP matrix was firmly implanted in my thinking. I wanted to share some things that I have learned from Clojure that were certainly possible in Java but never became fundamental to my programming practice.

Clojure certainly has learned a lot from Java. It might be cool if the learning went both ways. These are universal principles. In fact, these principles are actually well known in the OOP world. You probably already know them, so learning Clojure is not required (but it is recommended!).

1. Use immutable values

One of Clojure's claim to fame is its immutable data structures. But immutable values were appreciated even in the very early days of Java. String is immutable and was a bit controversial when Java came out. Back then, C and C++ strings were simply arrays which could be changed. Immutable Strings were seen as inefficient.

However, looking back, immutable Strings seem to have been the right choice. Many of the mutable Java classes are now seen as mistakes. Take, for example, java.util.Date. What does it mean to change the month of a date?

Let's go a little further. Let's imagine that I'm an object. You ask me when my birthday is. I hand you a piece of paper with July 18, 1981. You take that home, store it somewhere, and even let other people access that piece of paper.

One of those people says "cool, a date!" And changes it to his birthday, April 2, 1976 using setTime. Now the next person who asks for my birthday actually gets that guy's birthday. What a disaster! Why did I give away that magic paper that changes my birthday?

By making values mutable, this magical-changing-at-a-distance is always a possibility. One way to look at the reason it is actually wrong to use mutable values is that it breaks the information hiding principle. My birthdate is part of the state of my object. By giving direct access to the month, day, and year, I'm actually letting any class have direct access to my internal state.

The answer, of course, is to not have any setters on an object. After construction, the object can't change. That way, my internal state remains encapsulated.

This applies to collections as well. Have you ever read the docs for Iterator. Can you tell what happens when the underlying list changes? Neither can I. An immutable list would not have such a complicated interface.

Solution: Don't write setter methods. For collections, you have a couple of options. One easy thing to do is use the Google Guava Immutable... classes. If using Guava is not an option, whenever you are returning a collection, make a copy, wrap it in a java.util.Collections.unmodifiable...(), and throw away the reference to the copy.

public static Map immutableMap(Map m) {
  return Collections.unmodifiableMap(new HashMap(m));
}

To learn more about immutable values, I suggest watching The Value of Values, a talk by Rich Hickey, the creator of Clojure.

2. Do no work in the constructor

Imagine this situation. Your Person class has a constructor that takes a bunch of information (first name, last name, address, etc.) and stores it in the object's state. Someone on your team needs to store that data to a file, so stores it as JSON. For convenience in creating a Person, you add a constructor that takes an InputStream and parses it as JSON, then sets up the state. Just because, you also add one that takes a File, reads in the file, then parses it. And then one that reads in a web request given a URL. Great! You've got a very convenient class.

But wait! What is the responsibility of the Person class? Originally, it was "represent personal information about a person". Now it is also responsible for:

  • Parsing JSON
  • Making web requests
  • Reading files
  • Handling errors

What's more is that the class is now harder to test. How can you test the File constructor? First, you write a temporary file to the file system. Not too bad. How do you test the web request? Set up a web server, configure it to serve the file, then call the constructor.

The problem is that Person violates the single responsibility principle. Person is about keeping bits of information together, not permanent storage or serialization. It should be a data object, no more.

UPDATE: Solution: Keep your constructor free of logic. Separate out convenience constructors (like the one that parses JSON) into static factory methods.

To learn more about this idea (and more!), I suggest you watch OO Design for Testability by Miško Hevery, the creator of AngularJS.

3. Program to small interfaces

One thing that Clojure has done very well is to define a set of very powerful, small interfaces which abstract a pattern of access. The interface allows many different types to participate in an "ecosystem". Any function which applies to the interface can act on any type that implements that interface. Any new type can take advantage of all of the existing functionality already built in.

Take for instance the Iterable interface. It generalizes anything that can be accessed sequentially (such as a list or set). If all a method needs to do is operate on something sequentially, it only needs to know that it implements this interface. That means it can operate on types that were not known to the programmer when the method was written.

This aspect follows from the dependency inversion principle which states that high-level logic should be written in terms of abstractions instead of the details of the lower-level logic. Interfaces capture this principle well. High-level logic should operate on interfaces which are implemented by the lower-level logic.

Solution: Think hard about the access patterns for classes and see if you can't abstract out small interfaces which pinpoint those access patterns. Then program to those interfaces. Remember, it takes two to use an interface: the implementor and the client. Make sure you use them from both sides as much as possible.

Nothing increases maintainability and the future cost of code more than good interfaces. To learn more about this, I suggest watching How To Design A Good API and Why It Matters. It's an older (but good) talk by Joshua Bloch.

4. Represent computation, not the world

When I was in college, the teacher taught us that you should use classes to model objects in the world. The quintessential modeling problem was students registering for courses.

A course can have many students and a student can be registered in many courses. A many-to-many relationship.

The obvious choice is to make a Student class and a Course class. Each has a list of the other. Inclusion in that list represents registration. Methods like register and listCourses let a Student register or list the courses he's registered in.

Professors would present this problem in order to discuss the tradeoffs of different design choices. None of the configurations of Student and Course were ideal. An astute data modeler would see the pattern of the many-to-many relationship and abstract that out. You can create a class called ManyToMany<X,Y> that manages the relationship. You can create a ManyToMany<CourseID, StudentID> and it solves your problem exactly.

The issue is that this directly contradicts the teacher's lesson. A relationship is not an object in the real world. At best it is an abstract concept.

What's more is that it solves the more general abstract problem as well. The ManyToMany class can be reused anywhere it is suited. Even better would be to make ManyToMany an interface with many possible implementations.

I think my professor was wrong. The Java standard library contains many classes that are purely computational. Why can't application programmers write them as well? Further, look at the GOF Design Patterns Book. Most (if not all) of the patterns are about abstracting computation, not "objects in the real world". Take, for instance, Chain-of-responsibility, which Wikipedia describes as "Avoid coupling the sender of a request to its receiver by giving more than one object a chance to handle the request."

Solution: look for repetitive patterns of code and build classes that take care of those patterns. Use those classes instead of repeating the pattern over and over in code.


You may be interested in my Weekly Clojure Newsletter.

Existing Clojure Introductory Videos

As you probably know, I am running a Kickstarter project to create Introduction to Clojure videos. The project is still going (it runs until March 14, 2013). Please support it if you want the videos.

I know that there are many great videos out there that teach the basics of Clojure. Allow me to present my selection.

Jim Slaterry is creating video walkthroughs of the Clojure Koans.

Full Disclojure was a long series of screencasts by Sean Devlin that dives deep into Clojure. There are many lessons about Clojure and functional programming, screencasts explaining how to install editors, and a few explorations of katas. Most of the material should still be current.

Brian Will created a series of videos introducing Clojure back in 2009 which explains the basics of the language. The basics have changed somewhat, but this is still a good way to get into the language.

PeepCode has a video for $12 for Clojure beginners.

Rich Hickey gave some talks a long time ago teaching Clojure. Again, a little old, but I learned a lot from them when they were new.


You may be interested in my Weekly Clojure Newsletter.

Monads and Objects

If you went back to 1990 and asked a random programmer what an object was (as in OOP) what would he say? I bet he'd say something like "I don't know. I want to learn them but I don't know where to start." If you told him "Well, they're easy. Each object has a class. The class defines methods. The object encapsulates state . . .", then his eyes would glaze over. He receives a maelstrom of concepts and their relationships. In the end, he would learn nothing. It actually took a long time for OO to become the norm.

This is analagous to what is happening currently with monads. People want to learn, they get some high-concept explanation, they are unsatisfied. People try to explain. They really do. I still remember when I did not get monads, so I have a lot of sympathy.

Perhaps, like with OOP, it will take a generation to become understandable to the mainstream.

The best way I can think of to teach monads to a programmer is jQuery. You have to know jQuery to understand this example. And frankly, if you don't know jQuery, it is easier to go learn it and come back than to go read a monad tutorial.

Here's a (not-so-)secret: jQuery objects are a monad. When you do $('div'), you get an object which "contains" all of the div elements in the document. There are many methods on the jQuery object which modify the set of elements contained in the object and return it. That's what makes it a monad: the methods return a value of the same type.

Without the jQuery object, you have to follow (and re-follow--potentially repeating code) the logic of collections. The jQuery object controls the logic by which operations on the contained set of elements gets executed. You call a method, it changes all of the elements. If there is no element, nothing happens. If there is one element, it alone is changed. If there are many elements, they are all changed. And you don't need to know. If you call $('<div />'), even though you are constructing an object with only one element, jQuery internally turns it into a list. jQuery is making sure you don't have to know if you have 0, 1, or many elements. And this turns out to be a useful abstraction.

This logic is simple. You write this code all the time when you are dealing with collections of objects. But you have to write is all the time, over and over. And you have to remember to write it. The jQuery object does that for you.

That is the job of the monad. It gives you a single place to express the logic for access (in this case, a kind of collection logic). Now we are going to make a monad and show how we can write this logic in only one place.

To make a monad, you actually need two things. One is typically called return, but in OOP it's the constructor. We know that the jQuery constructor is the $ function.

The second thing you need is a way to chain methods. All you need is to return a value constructed with that constructor. Then you can chain. In Haskell, this is called bind.

But I promised that monads got rid of repetition. If all of your jQuery methods have to contain this collection logic and return logic, how does that get rid of repetition? Well, if you did it the way it's done in Haskell, you'd get rid of the repetition.

Let's do it that way. Let's define a method on the jQuery called bind.

jQuery.prototype.bind = function (f) {
  return $(f(this.get()));
};

We construct a new jQuery object with the elements transformed by f. So I can write:

function filter_divs(els) {
  var arr = [];
  var i;
  for(i = 0; i < els.length; i++)
    if(els[i].tagName === 'div')
      arr.push(els[i]);
  return arr;
}

$('.x').bind(filter_divs).addClass('hello');

Let's take it further. We can bind the function and store it in the prototype:

jQuery.prototype.bind_def = function (name, f) {
  jQuery.prototype[name] = function () {
    return $(f(this.get()));
  };
}

Now we can call it like a method.

$.bind_def('filter_divs', filter_divs);

$('.x').filter_divs().addClass('hello');

So, now we can chain. We only have to write the logic once (in bind or bind_def). And we have a constructor. That's a monad.

Now, imagine other bits of logic you could build into a method chain. What happens if a method returns null? If you call a method on it, it will throw an exception. You can capture the null check in a monad. That is called the Maybe Monad.

There are other useful monads, but they all are simply structured ways of chaining by giving controlled access to the internals of the object.

Thanks to Douglas Crockford for some of these ideas.


You may be interested in my Weekly Clojure Newsletter.

Monads & Gonads <λ>

Douglas Crockford gives his version of the obligatory monad tutorial. His version is in Javascript with hopes of needed demystification.

I have to appreciate the freshness of his approach. He says a monad is just an object with certain properties (the monad laws). That's a great way to explain it in an object-oriented context. Objects are the unit of encapsulation. Monads serve the purpose of encapsulating state and allowing structured access to that state.

There is one thing that bothers me about his presentation. He uses mutable objects. I guess that is to be expected in Javascript. But mutating state in a monad seriously limits their possibilities, as mutation does in most uses.

Either way, it is an entertaining watch and a good explanation of monads. Perhaps the most down-to-earth I have seen.


You may be interested in my Weekly Clojure Newsletter.