What is an abstraction?

Summary: We explore some of the background behind the meaning of the word abstraction and why we do it.

For a term we use so much in our field, there are very few definitions of abstraction. And when I gave my talk called Building Composable Abstractions, that was a persistent question: what did I mean by abstraction? I'm going to be talking about abstractions a lot, both on this site, and in verbal discussions. I'd like to know what I'm talking about. Instead of searching for a precise definition, I'd like to expand on the background ideas that make it so difficult to define.

There are two uses of abstraction that I think will serve well to begin this discussion. The first is from one of my favorite programming books, Structure and Interpretation of Computer Programs by Abelson and Sussman (Section 1.1).

means of abstraction, by which compound elements can be named and manipulated as units.

In their "definition", abstraction is about naming. Naming is a funny thing. It's used for identity---as in the name of a person---and also to impart a meaning---as in to name an idea. We have a tendency as humans to come up with new terms. They are new names for perhaps new meanings. It is part of our natural linguistic abilities. When we program, we do this all the time. Whenever we create a variable or name a function, we're inventing a new term and assigning it a meaning.

Its meaning, in our program, is its behavior. What does this thing do? When should I use it? How do I use it? A function calculates a return value based on its arguments. You use the function by calling it or passing it as an argument to something that will call it. In Clojure, you call a function by putting its name in the first position in parens. In Clojure, functions can also have effects.

Programming language theorists call enumerating all of this meaning the semantics of the language. But it's just another word for meaning. Giving a descriptive name to a thing is all about giving it a clear meaning.

How are Clojure functions implemented by the compiler? We mostly don't care. We write functions and call them without much regard for the things they compile to. Insofar as we can ignore those implementation details, we call the abstraction robust. When that implementation becomes important and we can't trust the abstraction to work as it is intended, we call the abstraction leaky. A robust abstraction is able to hide the details from us and give us a new basis on which to base other abstractions.

That brings me to the second "definition", this one by Edsger Dijkstra.

The purpose of abstraction is not to be vague, but to create a new semantic level in which one can be absolutely precise.

Dijkstra's quote goes right to the matter of its purpose, which we have yet to go into. We want to create a new semantic level (meaning again) where we can be precise. My reading of the word precise is that we need to be able to say exactly what we mean and no more. We want our compound elements to be exactly suited to their purpose.

I believe this is the hard part of programming that we always talk about. We are simultaneously inventing a new purpose and the thing which is supposed to be suited to it. Then we have to name it to make that purpose clear. Three things we're doing simultaneously. That's a lot of degrees of freedom that can lead us to an imprecise abstraction.

There is a lot of distrust of abstraction in our industry, and I think rightly so. We have been burned time and again by abstraction for abstraction's sake and abstractions that hide problems. These are abstractions that don't fulfill their purpose and should be distrusted. But we should not distrust all abstractions. Part of our job is to learn what constitutes a good one.

Here's a great example of the distrust of abstractions. Joel Spolsky, internet entrepreneur, coined an aphorism he called "The Law of Leaky Abstractions":

All non-trivial abstractions, to some degree, are leaky.

He gives the example of TCP: the abstraction is "make a TCP connection and send data reliably, despite lost packets and other vagaries of networks". It's a great abstraction except it has a leak: what if you pull the network cable? Nothing will get through! According to the law, the mechanism will always "leak" through eventually. It reminds me that metaphors can only be stretched so far and that there will be some true things we cannot prove.

I think the law has a lot of truth in it, but his examples are a bit of a strawman. Why? Well, sockets handle a lot for you, and they also have well-defined errors. If the connection is severed, your socket will raise an error. So the socket abstraction is not actually hiding that network cable problem from you. On the contrary, it's building it into its fabric. Perhaps the real problem is that errors can so easily be ignored in our programming language. So I like to think about what I call an "Inverse Spolksy's Law":

All too-trivial abstractions, to some degree, are leaky.

The idea is that the abstractions that are leaky are the ones that are not precise in the way Dijkstra specified. They're hiding something they really can't hide. To make it more precise, you need to hoist more into the higher abstraction level than you would ideally want to. And this is the number one sin I find in abstractions: they hide too much.

Abstraction is something we also see in Algebra. We give a value a name, though we don't know what that value might be. We call these variables in Algebra. We can manipulate these names like values and arrive at sensible answers. This shows a very beautiful relationship between mechanical symbolic manipulation, meaning representation, and mechanical calculation. Abstractions have laws in themselves. Hence I can manipulate a program algebraically and talk about its properties, all without running it. The abstraction becomes a thing to talk about.

Something in there points to the roots of intelligence, and many books have been written about this. The reason it is so hard to talk about is that we don't have very good introspection into how we think. We're still understanding it after trying for thousands of years. We're better at doing it than knowing what we're doing.

The thing I love about programming is that we deal so directly in meaning, like an artist or a philosopher does. We know that, in the end, we are building a mechanism to control electron flow in our computers. Yet we work in the world of ideas. My brain wants nothing more than to escape from the mundanity of logic gates and build sculptures of thought.

Conclusions

We deal in abstractions every day as programmers. We're either using them, creating them, or debugging them. Abstractions are a natural extension of our linguistic abilities. They let us name concepts so that we can use them to form bigger ideas. Much of language design and industry programming books are about how exactly to make these abstractions. How do we go about making these things? Are there better and worse ways? These are some of the things we must understand better as software takes over more and more of our lives.