Clojure Hashmaps

March 17, 2016

Summary: Clojure hashmaps are one of the workhorse data structures in Clojure. There are two main patterns commonly used. We also discuss some interesting properties.

Let's talk about the Clojure hashmap. It's one of the workhorses of Clojure. It's used for so many things, including records and indexes, two important patterns in any Clojure code. They also have some super powers in Clojure that make them interesting.

Hashmaps are pretty common in languages these days. Sometimes they're called maps, dictionaries, or associative arrays. They associate keys to values. But that's a confusing way to put it because everything is a "value", even the keys. That's what programmers say, though, and the APIs call it that, so let's just continue with the common nomenclature.

Hashmaps are called dictionaries because they're kind of like a vocabulary dictionary. You have a word and you want to look up the definition. The dictionary is arranged in alphabetical order so it's fast to find the definition. It's similar in a hashmap. It's really fast to look up the key and get the value back. Going the other way is slow. If you know the definition, how do you find the word? You'd basically have to read the whole dictionary. It's the same for hashmaps. Finding the key given a value is super slow. You have to go through everything one at a time.

In Clojure, hashmaps are used mainly in two different ways. The first way I'm going to call the Record pattern. The hashmap in the Record pattern is like a form you fill out at a doctor's office. Each form field has a name (the key) and a blank (where you put the value). When you fill it out, it becomes a record of your information. The doctor gets every patient to fill one out. They all have the same keys but different information.

In Clojure, you could do something like this:

(def john {:first-name "John"
           :last-name "Lennon"
           :date-of-birth #inst "1940-10-09"})
name this value
keys are keywords
key/value pair
key/value pair
key/value pair

There are three important features to note. 1. The hashmap uses keywords as keys. This is typical but not required. Keywords are just convenient for both human and computer to use. You will also see strings commonly used. 2. The values can vary their types (in this case strings and dates) depending on the key. 3. If we use the same keys in other maps, we can access them the same way. We would probably start calling hashmaps with this format "patient". These features are all typical of the Record pattern.

If we want to get the date of birth of a patient, we can say (get patient :date-of-birth). You can also write (:date-of-birth patient) to mean the same thing. This uses the ability of keywords to act like functions.

Another pattern we see a lot is to use a hashmap to hold onto something you want later. We'll call it the Index pattern.

In the index pattern, instead of repeating the same keys in many similar hashmaps, you've got one hashmap with keys representing the identity of the value. Let's say we wanted to be able to look up patients by last name. We can make an index map like this:

(def by-last-name {"Lennon" john
                   "McCartney" paul
                   "Harrison" george
                   "Starkey" ringo})

Now we can look up the patients by last name like so (get by-last-name "Harrison"). If we have a bunch of patients and we want to index them, we can do it like this:

(zipmap (map :last-name patients)
        patients)
make a hashmap
list of keys
list of values

See zipmap for more details.

The Index pattern is a lot like a filing cabinet for patient records. The records themselves are filed by some identifier (in this case, last name). There are two things to note. 1. The types of keys are consistent (here, last name) and 2. the types of values are consistent (patient records). The notion of type, however, must be taken flexibly like it is in Clojure. The nice thing in Clojure is that the same record can be indexed in multiple places (it's just data, after all).

Right, so those are the two main usage patterns for hashmaps in Clojure. You might also see a hybrid approach where the two patterns are combined. Hashmaps are so flexible that you often see these two patterns mixed and blended. I'm not a big fan of that but it does have its uses.

Clojure hashmaps have some nice properties that you might want to rely on. Hashmaps, like other Clojure data structures, are immutable. That means they cannot change after they are created. When you want to add a new key/value pair, you make a new hashmap. That might seem expensive, but Clojure hashmaps are persistent, which means a hashmap that is modified from another hashmap shares most of the underlying data with the old one. That makes it very cheap to make modified copies. Finally, hashmaps can be called like functions. You pass the key as an argument and you get the value in return. This is sometimes useful.

Hashmaps also implement some of the core Clojure abstractions. They're countable (you can get the number of key/value pairs), associative (adding and looking up by key), seqable (iterate through key/value pairs), and ifn (callable like functions).

Hashmaps are used so much in Clojure that it's required knowledge. And it's hard sometimes to know how hashmaps should be used. I hope identifying these two patterns helps you on your Clojure journey. Clojure's hashmaps are immutable and fast, they integrate well into idiomatic usage, and there's plenty of core library functions that use them. Learn hashmaps well and you'll be on your way to mastering Clojure.

You might also like