How to write software

A guide for the solo programmer

I have created a planning and organization process for developing software. The process works for me. I need to organize my work ahead of time or I get stuck working on unimportant details.

I have built minimalism into the process. I tend to work alone. Solo programming imposes very real limitations on the scope and completeness of software. One man can only do so much.

My process includes nothing that cannot be found in Worse is Better, The Cathedral and the Bazaar, Paul Graham’s writing, and Getting Real. These are well-written and well-known works. I thought I understood them when I read them, but I was wrong. It wasn’t until recently, after projects stalled and failed due to feature creep, that I understood the essence of the release early, release often idea. I have attempted to formalize the idea to be more actionable instead of philosophical.

Here is my hard-earned process for writing software. Note that this is not a strict algorithm. Sometimes I will apply filters from Step 2 when I am doing experiments in Step 3.

  1. Choose a purpose.

    The software must serve a single purpose. Clearly defining it will illuminate the entire software process.

  2. For each release:

    1. List features.

      List out all of the features you want in the software. Not all of these will make it into the release, but if it is not on this list, it will not make it into the release.

    2. Apply the following filters and transformations, in any order, repeatedly, until there is nothing further to filter out from the list of features.

      • Break features into subfeatures.

        If a feature can be broken into two features, do it. All features must be atomic. For instance, an easy one would be if I want to send messages by email and through a web API, that’s two features. Rule of thumb: if it can break, it’s a separate feature.

      • Eliminate unnecessary features.

        Features that are not essential to the purpose are unnecessary. Features that can be put off until the next version are unnecessary. In the above example, maybe email can wait until the next release. If it can wait, it will wait. You are trying to make a B line to release.

      • Fill in “hidden features”.

        Remember, if it can break, it’s a feature. Sometimes there are features that need to be listed because of the existence of other features. If you support two different communication channels, you need a way to choose between them. Whatever mechanism presents that choice is a feature, too. List it.

      • Simplify features.

        If a feature can be done more simply or more easily than what you originally intended, replace it. The primary way to release quickly is to simplify the release. Do you really need it to be customizable?

    3. Design and carry out experiments.

      Within your minimal set of features, there are often questions as to the best means of implementation. The experiments determine the answers to those questions.

    4. Order the features in order of dependency.

      The first feature is the one that does not depend on any others. Dependency is a complex and sneaky issue. You might say that a function needs a button to activate it. But the button is useless without the function. Which depends on which? Personally, dependency means dependent on another feature for usefulness. If I can activate a function in a very simple way (like immediately when the program starts), then that function does not depend on the button. The button depends on the function for its usefulness.

    5. Sketch out the first remaining feature on paper.

      Sketch out the code. The experiments should have instructed you in how to implement the feature. Rewrite the code repeatedly, eliminating fluff and abstraction until the code is minimal and perfect.

    6. Type the code in.

      Build it and test it.

    7. Make the code rock solid.

      Clean it up. As a solo programmer, you are never going to come back to this code and clean it up. You won’t have time. You will never have the time to hunt for bugs in this code. Make sure it works in every possible case. The code must catch every exception. It must handle the null case. The feature is not done until it is unbreakable. If you are doubting whether it’s worth your time to work this much on one feature, maybe you should eliminate the feature.

    8. Loop to Step 5 until you run out of features.

    9. Release.

The essence of this process is to follow the release early, release often philosophy by reducing the amount of development between releases. I mistakenly thought that release often meant “write code really fast” in the past. Now I understand that it means write less but more important code between releases but write it correctly.

Clojure News March 26

This is something I’m trying out. Probably best to watch in full screen.

News from this week in the world of Clojure.

Keyword arguments in Clojure.

Disclojure episode 15.

LabREPL announced.

Automatically create wrapper object in Clojure

Clojure has made it easy to do something which should be really simple and straightforward, but in Java is not.

How many times have you wanted to create a wrapper class in Java? You have to write all of the methods to call the same method on the wrapped object. It must be a common frustration, since Eclipse lets you generate one pretty easily through a menu.

But in Clojure, it’s simple.

Tim Brool shared a macro for creating a proxy that doesn’t throw exceptions for methods that aren’t define. It is called autoproxy. I cleaned it up a bit to work in a few more cases. I also created a macro called autowrapper that will generate methods that automatically call the wrapped object’s methods.

You can find them on github.

I was working on a web project that needed to return a page fetched using Apache HttpClient. The documentation says that all of the data from the InputStream must be read, then the HttpMethod must be closed. But since I was using Compojure, I could not control what happens after the data from the stream is sent to the client. Compojure funnels the InputStream data to the client, then closes the InputStream. That would be the perfect time to close the connection.

What I needed was a way to wrap the InputStream in an object that does everything exactly the same as the InputStream, but also closes the connection after the InputStream is closed.

So that’s what I did.

auto-wrapper takes an object as its first parameter. The rest works like proxy. It will generate a proxy with all of the methods of the given classes and interfaces, minus those that are final, because those cannot be overridden. Here is the macroexpansion of the above auto-wrapper form.

Well, there you have it. Let me know what you think, if you have any improvements, or if you use it in a project.

Graph Reasoner for Clojure

I’ve been working on a graph reasoner I’ve called clj-reasoner.

Graph reasoning is a way to do inference over a relationship graph.

Let’s look at an example. Imagine we have gathered friendship data from various sites on the web. We basically have an impartial social graph:


friendshipgraph -- Generated by EHT-Graphviz

Clojure code to generate this with clj-reasoner

So we can query this graph into table form with a graph pattern with variables.

hasFriend(x, y)

This query, over the friendship graph above, gives these results:

x y
Andrew John
Chris Andrew
Jane Linda
Jane Chris

This data is just the original data used in the graph. This is not as useful as it could be. First of all, the graph is directed. But is friendship ever one-sided? Debatable. But in many cases, when we say Jane is John’s friend, we mean also that John is Jane’s friend. This states that friendship is symmetrical. We can use this query pattern to make a rule of symmetry. Basically, the rule is:

hasFriend(x, y) \rightarrow hasFriend(y, x)

Clojure code for this query

This basically doubles our edges.


friendshipgraphsymmetrical -- Generated by EHT-Graphviz

Clojure code to generate this graph

This graph contains more information. That information is inferred from the original data and the rule. More useful.

There’s another rule that is commonly used in friendship graphs. That rule states that a friend of a friend is a friend. It states the transitivity of friendship. That rule can be defined like this:

hasFriend(x, y) \wedge hasFriend(y, z) \rightarrow hasFriend(x, z)

Clojure code for this query

If we apply this rule to the symmetrical graph above, we get:


friendshipsymmtrans1 -- Generated by EHT-Graphviz

Clojure code to generate this graph

When we applied the rule once, we infer a certain number of friendships. But there’s something missing. The graph says that Andrew has Chris as a friend, but it’s not reciprocated. Also, if Chris is Andrew’s friend, and Linda is Chris’s friend, by transitivity, we know that Andrew is Linda’s friend. That also is not on the graph.

By applying first symmetry and then transitivity, we don’t get symmetry on the new transitive edges. Also, since we only applied transitivity once, we don’t get transitivity on the transitive edges, either. What we need to do is repeatedly apply our rules (both symmetry and transitivity) to the graph until there are no new edges produced. This is called the transitive closure of the graph. It is not the same kind of closure as in functional programming. Here’s the transitive closure of the graph using our two rules:


transitiveclosure -- Generated by EHT-Graphviz

Clojure code to infer this graph

That’s a big mess. It’s a good thing we can infer all of that from our original graph without typing it ourselves. This is a simplistic example. This last graph basically says that everybody is everybody’s friend. There are more sophisticated things you can do with this, especially if you have more than one relationship type and more interesting rules. But I think this example shows that graph reasoning can be useful.

This is the current state of the the clj-reasoner system I’ve written. You can find this example in that code.

I’d like to extend it in two simple ways: one, distinguish between original edges and inferred edges; and two, allow rules with validation expressions.

For instance, in the above graph, the system inferred that everybody was their own friend. But this does not jive with our idea of what a friend is. Friendship is not symmetric. The problem results from an unexpected combination of the rules. It would be better to write the transitivity rule like this:

hasFriend(x, y) \wedge hasFriend(y, z) \wedge x\neq z \rightarrow hasFriend(x, z)

So you can’t be your own friend. I’ll add that soon.

Link Grammar and Graph Reasoner Projects

In my last post, I said I’d talk about what I’ve been working on.

I started working on cleaning up my version of OMeta, but I found fnparse and that looks way better than my parser. It’s different two important ways: OMeta was a DSL for writing parsers, not a monadic parser; and OMeta could not combine parsers. I think in the end, having parsers defined in the host language is important. I needed to define a way to distinguish variables from keywords and other parts of the language. Not fun, especially when it’s already in Clojure.

So go use fnparse. I’m not developing OMeta anymore.

Then I started cleaning up my Link Grammar parser. It’s not exactly the same as theirs, and it’s still immature. I worked on it all day yesterday and today, trying out different strategies for structuring the code.

I was having trouble with stack overflow errors on larger sentences. After trying all of these things, I must have hit on something, because the problem just disappeared. I don’t know why.

Here are some of the ways I tried:

  • Continuation passing with trampolines to make it lazy. I thought this was necessary, but it turned out not to be, surprisingly.
  • Adding lazy-seq before everything to make it lazy. This worked faster.
  • Using sets instead of lists to build up the solutions. I was surprised to learn that this was the fastest.
  • Memoizing two key functions that get called repeatedly. Fastest, but uses lots of memory.
  • Trying to avoid making a globally memoized function (to save memory) by caching some things locally in a function. Not quite as fast.

I’ve got all of those versions pushed to the clj-link-grammar repository.

I also looked around my ~/clojure directory and remembered I had done a cool little graph reasoner in Clojure. It’s really simple. It is not nearly OWL complete, but it does most of the interesting stuff. It doesn’t do math or anything. It just matches strings. Given an appropriate graph, it can answer questions like “Who has a girlfriend who likes apples?” And it’s less than 100 lines. Please check it out at the clj-reasoner repository.

Looking forward to posting again soon

I’ve been working on some cool projects in Clojure, though I haven’t posted much about them.

I hope to talk about them soon. Thanks for being patient in my absence!

I’m also really excited by the activity in the Clojure world. It was good and now it’s becoming great!

Programming Language Religions

We have all experienced it: you make an innocent statement about a programming language online, and suddenly you are mired in a flamewar. You are labeled a fanboy and flamebait.

Why is choice of programming language such a touchy issue? It’s like once you enter into certain areas of the topic of programming languages*, you are instantly transported to a land of Black and White, good and bad, binary logic, of bitter conflict and rivalries between factions.

You can explain why people are so fighty with cognitive dissonance. Learning the intricacies of a programming language takes time and effort. It requires a lot of trial and error, along with reframing your fundamental understanding of how to approach a problem. In order to justify such an struggle, programmers raise the level of importance of the particular features of that programming language, particularly the foundational ones. So any statement that seems to abase the language is seen as an insult to your own effort and thought process. Programming language is a personal issue.

If you compound this with the impersonal and anonymous magnifier of the internet, where each past statement is indelibly marked alongside other comments, and people tending to post before they read all of them, and those incendiary one-line comments tend to be read more, and all of the other incitements of the expanding spiral of discussion threads, you get something like you see online all the time.

Unfortunately, there is nothing to do about it. It appears that, like religious or political disputes, they are here to stay. It’s the price you pay for being online.


Notes

*Comparative analysis is one. Typing style (static versus dynamic) is another.

Investigations into Terracotta

Clojure + Terracotta = Yeah, Baby!

I just stumbled across this effort to get Clojure running on Terracotta, a JVM clustering infrastructure. Terracotta gives your distributed JVM’s a shared heap, so the possibilities of distributing computation are immense. This kind of thing really gets me going. All I need now is a room full of server racks.

Writing map in Haskell

Recently, I posted an article about why I didn’t like Haskell’s type checker (or really, any of the other type checkers I know of) in theory. I sensed that it got a lot of flack. It was an idea I had been pinging around in my head for a while. It was not meant as a criticism against Haskell, but a statement of my preference.

Someone suggested to me that I post specific instances of things I would like to be able to say in Haskell, but cannot. It appealed to me. I don’t know why. Perhaps it was his patience, which seems to be a rare commodity on the internet. Instead of insulting me for my mistaken views, he actually tried to explain my errors to me. Perhaps, if I gave the examples he requested, I would be shown that I really could do what I wanted. Maybe he wouldn’t make me feel like a douche for not reading a book on Haskell just to write one blog post about it. This post is for him.

I will give a couple of examples of things in Haskell that violate my own personal taste. They are both examples of failures of the type system to encompass my intent.

So, here is the first thing that I remember trying to do in Haskell, before realizing it was impossible. It is something that is explicitly disallowed (having values of different types in the same list). My understanding of what Haskell meant by type was faulty. I did not get the difference between a type and a type class.

Download listofnumbers.hs

--List Polymorphism
--Note that I have been told that this is possible in some
--controversial extension
sum [1.5, 2, 3, 5.9]

I thought that the type of the list would be [Num] , not Num a => [a] . I was wrong, but I don’t think I should have been. Addition is blind to what type of number it is, and summing should be, too.

There are some standard workarounds for the previous example. The next example shows perhaps a more practical reason for wanting lists of different types. Unfortunately, I can’t even express the broken Haskell code. I’ve tried. In Haskell, how do you express a function of n arguments, where n is unknown? So I’ll do it in Clojure. I apologize if it makes the conceptual translation difficult.

Download mymap.clj

(defn mymap [fn & lsts]
(let [ls (map first lsts)]
(if (some nil? ls)
nil
(lazy-seq
(cons (apply fn ls) (apply mymap fn (map rest lsts)))))))

It’s a function that takes a function fn and n lists of arguments. fn takes n arguments. A new list is created whose members are the application of fn to the corresponding elements of the lists of arguments.*

The best way I found in Haskell was to define several different map functions. There’s map1, for functions of one argument, map2 for functions of 2 arguments, etc. Each carries basically the same functionality. If Haskell’s type system is so advanced, why can it not generalize the pattern? If it is not that advanced, why would I want to use it?

Maybe you can do this in Haskell. I don’t know. I tried a few approaches and decided it was fundamentally impossible. I don’t know Haskell very well, so please correct me if I’m wrong.

But in the end, it’s just an example. Can Haskell represent a generalized function type? Maybe it can. But I can always find something that Haskell’s type checker (or any type system) can’t deal with. Without a trapdoor, I will always hit a wall of what I can express.

Addendum

I’ve gotten two very thoughtful comments from my readers. And, well, it turns out you can do both of the things I described above. My mistake.

I also got a pretty good explanation of why you don’t want to do it in Haskell. Currying is so common, you need some way of saying how many arguments to consume. So variable arguments are generally avoided.

I am starting to reshape my ideas about Haskell. I have always considered it at the forefront of functional programming. I just never wanted to be at the forefront of functional programming.

Lisps still have something over pure functional languages! You probably know what it is. I daresay that linguistic dynamism, that is, being able to compile new code from arbitrary locations is pretty darn cool. But the way things are going, you might be able to do that in Haskell, too.


Notes

*There is absolutely no static type safety here.

Nice article on small software

The Virtues of Small Software

Nowadays we have thousands of times the processing power, memory and storage yet, from the user’s perspective, software for the desktop, web and mobile seems to run slower than it should, or used to.

While it does start a bit backwards-looking*, this article wraps it up with a great positive, futurist spin: mobile devices are exploding in Africa and will revive a focus on space and time requirements.

I think smaller memory footprint and faster speed are truly needed right now, though I think looking back to assembly is the wrong answer. Temporarily, assembly will be very useful in the developing world until a better solution is found. Speed and power consumption are critical concerns. But we know so much more about computer science than we used to. We should look for solutions that enable aggressive computational abstraction without the bloat. And we might find (as some already have) that computers can write faster, smaller code than any person ever could.


Notes

*Remember the good-old-days when we had 6k of memory? Yeah, that was great.