It seems to me that if you wanted to really put an end to the old "C is faster than X" (where X is your favorite High Level Language) snickers and arguments you'd start taking advantage of the vast amounts of information contained in a High Level Language program as opposed to a Low Level Language program.
What do I mean? Well, because C programs are so low-level---they're all about byte manipulation, basically---the compiler doesn't really know anything more than byte operations. There is no high-level semantics. The constraints and meanings embedded in the semantics of a high-level language allow for lots and lots of optimizations.
Let's look at some successes.
By far, I think the one that comes to mind the most easily is Haskell. Part of the semantics defined by the language is that there are no visible side-effects (at least in a large portion of the code). This means that the compiler can optimize out lots of unnecessary computation. How does it do it? No side effects means that you can do lazy evaluation. Basically, that means don't calculate anything you don't need to, and if you do calculate it, you might as well remember it so you won't have to calculate it ever again.
So, automatically, all of this code you wrote that would, using a naive version, create billions of conses now only creates a few hundred. Automatic optimization. You can't do that in C because there's no way to know what the program is trying to do---because anything is allowed. And all of that information is used at run-time. Haskell does some other cool optimizations, too. I just don't know enough about them.
Here's another example: tail-call removal in Scheme. Most Common Lisp implementations do it, too. But in Scheme it's part of the standard. What does it mean? When you write a recursive function in a certain format, it is automatically converted to an iterative function that generates the same value.
Steve Yegge has a whole talk on how run-time information trumps compile-time information. He says there are techniques now that can let a VM adapt to dynamic typing information so that it approaches the speed of static typing. But only if you have some kind of runtime environment.
These are all great. But they're not enough. So many optimizations are done at a low level. But they only gain so much. Here's a question: would you rather optimize a function so that it takes 10 instructions instead of 15 OR optimize the whole algorithm to call the function half the number of times? High-level, algorithmic optimizations trump low-level optimizations.
Here's an optimization I wish my Lisp implementation did (if it does do this, let me know. I'll jump for joy): convert (= 10 (length list)) to (length= 10 list).
Huh?
This is what I mean: it should rearrange the AST at compile-time to semantically equivalent programs that are more efficient.
If I write (length list), it will recurse down the list and add 1 for each cons in it. But if I only want to know if there are exactly ten items in the list, the LENGTH function is going to go to the end of the list anyway. Even if there are 10 billion items in the list. That is not smart. What most people say is that you should do it yourself, changing (= 10 (length list)) to (length= 10 list), which only recurses 10 times at most.
But what I say is that the compiler can do it. LENGTH is part of the standard language, and so is =. They should get compiled away. Further, I think there should be an entire framework that lets you define and manage your own optimizations. Since you know the semantics of your own programs, you should be able to define all sorts of great optimizations without having to mess up your clean code.
Oh, and why is tail-call removal the only kind of recursion-to-iteration techniques used? There are lots more! Read the paper called "A system which automatically improves programs". I love old papers! So full of untapped goodness.
That paper is so chock full of good stuff that I'll reserve giving it a thorough review here. However, this quote will explain nearly everything:
A programmer is able to present his algorithms to the system in a clear and abstract language. The system converts them to efficient but probably not transparent versions.
For example, here are two versions of one program which reverses lists.
[note: I have converted the pseudo-code in the paper to Common Lisp]
(defun reverse (x) (if (null x) nil (append (reverse (rest x)) (list (first x)))))
(defun reverse (x) (let ((result nil) (temp nil)) (loop while (not (null x)) do (setf temp (rest x)) (setf (rest x) result) (setf result x) (setf x temp)) result))
One is clear and abstract, the other more tortuous but efficient. Given the first as a definition, a competent programmer should be able to produce the second. Our system can do this for him.
The system is built around the concept of abstract programming, and we hope to encourage a user to formulate his algorithms in abstract terms appropriate to his problem domain and leave the system the task of implementing them efficiently.
So the code it transforms it into in the quote above is destructive: it modifies the list. But in the paper it shows that it's optional. The Lisp implementation they wrote makes that an interactive step. But the conversion of the recursion to iteration is just as safe as tail-call elimination---and clearly defined in the paper. Why aren't we doing this?
I would love to see a system where you can define your own optimizations as transformations of the AST. Transformations that are applied automatically or optionally turned on or off if you like. Simple pattern matching would do. The paper I cited is not a simple pattern matching operation. It's more complex---but definitely doable.
While the C coders are busy digging through their optimized spaghetti code, the Common Lisp compiler is intelligently rewriting your highly abstract program for you. I love it!
Comments
Unnecessarily harsh
While I react negatively to the oft-required use of C/C++, I can't help but notice that your claims against C are largely in error.
Every operator in C can be thought of as a lambda expression of finite arity. Indeed, the output of a C compiler's parser is an abstract syntax tree -- expressed usually as a list of lists, just like Lisp code. Well-factored C code is largely identical in semantics to similarly factored Lisp code. OK, the syntax sucks, and you have to type more, but the same information is present in both systems.
Moreover, I do have some experience with commercial Forth compilers which produces very optimized code, comparable to C, even though C is significantly "higher level" than Forth is, at least out of the box. Don't knock data-flow analysis, continuation-passing style, and static single-assignment intermediate representations. With these, you can also perform dead-code elimination, just like Haskell does. In fact, with CPS, you can even perform return optimizations (e.g., if you notice that a function return value is used immediately by a switch/case-like construct, then you can just compile the function so that it jumps directly into the relevant cases -- this is an optimization which Haskell performs, but is in no way limited to just Haskell, nor is it dependent on Haskell's purity).
I believe the GHC Haskell compiler provides a framework that lets you define and manage your own optimizations. I've never used it myself, so I don't know the full details, but I have seen other programs which defined "hints" to the compiler that, basically, allows you to say, "Whenever you see X, it is equivalent to Y". Then there is template Haskell, which if I recall correctly, extends this concept even further, including support for macros (which is a crude form of what you're looking for anyway).
But, I shy away from all these automatic systems, because there is no one-to-one correspondence with what I write, and what the computer executes. I've been burned by this more times than I can remember. No, I prefer instead to write simple code, with a simple compiler, producing simple binaries. I don't care how long they run, at least at first. If, and only if, performance becomes an issue, then I'll look at profiling and optimizations. (You do profile first, right?)
Finally, what you're describing cannot solve the largest problem of all -- the recognition that you're writing a simple linked list implementation when you meant a B-tree. But, then again, what kind of B-tree? Would a skip-list be sufficient? You want real 2-3-4 trees, or are you happy with an AVL-tree implementation?
I vote no to automatic anything. The computer is stupid, able to recognize, and optimize based on, only the most basic of arithmetic truths. There are a number of them, for sure; but, what you're calling for above makes no economic sense. Why write one implementation of something, when you write rules on how to replace said something with something else. You might as well just delete and re-write the relevant code into "something else" from the start. You'll save time, money, energy, frustration, nerves, and electricity.
Enlightening
> Every operator in C can be thought of as a lambda expression of finite arity. Indeed, the output of a C compiler's parser is an abstract syntax tree -- expressed usually as a list of lists, just like Lisp code. Well-factored C code is largely identical in semantics to similarly factored Lisp code.
The problem with the semantics of C is that anything is allowed. If you write good, clean C, I am sure you can perform a lot of these operations. But C allows you to read from and write to any memory location in the program at any point. Some correct programs take advantage of that. Some programs are incorrect because of that. But how can the compiler tell the difference?
> But, I shy away from all these automatic systems, because there is no one-to-one correspondence with what I write, and what the computer executes. I've been burned by this more times than I can remember.
Have you been burned by tail-call elimination?
> No, I prefer instead to write simple code, with a simple compiler, producing simple binaries.
This is what I am suggesting. Write the simplest, abstract representation of a function. The compiler will produce a simple binary. The simplest abstract representation very often is a recursive definition. The simplest binary is usually iterative.
> I don't care how long they run, at least at first.
I don't care, either. But when it's time to optimize, I don't want to complicate my code any more than I have to. Just like I want to say (memoize 'fib), I want to be able to say (iteratize 'fib). But I don't want to have to drastically change my code when it can be left the way it is and simply advised how to make it faster.
> Finally, what you're describing cannot solve the largest problem of all -- the recognition that you're writing a simple linked list implementation when you meant a B-tree. But, then again, what kind of B-tree? Would a skip-list be sufficient? You want real 2-3-4 trees, or are you happy with an AVL-tree implementation?
It's true. No computer will be able to recognize that your algorithm is inefficient for the problem.
But the point of a high-level language is to allow you to quickly change from one implementation to another. Painlessly. That means with as few changes as possible.
For instance, many high-level languages support polymorphism. This allows you to trade out a Vector for a Linked List if you choose with minimal change in the code. If it makes it difficult to change, you'll be less likely to do it. And I think the same should be true of recursive/iterative functions.
> I vote no to automatic anything.
Perhaps you would want to control when it happens? Certainly it could be built as a compiler advice system.
> The computer is stupid, able to recognize, and optimize based on, only the most basic of arithmetic truths. There are a number of them, for sure; but, what you're calling for above makes no economic sense. Why write one implementation of something, when you write rules on how to replace said something with something else. You might as well just delete and re-write the relevant code into "something else" from the start. You'll save time, money, energy, frustration, nerves, and electricity.
The whole reason for this is that there is the difference between what I think is simple to read, write, and maintain (ie, simple recursive functions) and what the machine thinks is easy to run (ie, iterative functions). Why implement one version and write rules to translate it? Because the first version is usually simpler. The rules can be used repeatedly. So I wind up with code that's smaller. Do Not Repeat Yourself.
When I write C, the simpler functions are usually faster. There's a much more direct correlation between what I write and what the compiler writes. In C, fewer lines of code, in general, really does seem faster. Basically, I would say that fewer lines of code means you're doing less unnecessary computation.
The same happens with Forth. Though I've never written in Forth, my understanding is that the low-level control you have over what happens tends to push you toward simple implementations.
The reason to have automated optimization is very simple: when you are exploring your problem space, like you said, you don't care how fast it runs. But as you understand your performance bottlenecks and you need to make it faster, how much of the code do you want to have to rewrite? How willing are you to trade simple code for longer, more optimized code?
In assembly, after the first version you understand the problem a bit better. Then you throw it away and start over. This is actually recommended by some of the giants of programming. But in higher-level languages, people tend to do more refactoring instead of starting over. Write code then make it cleaner. Encapsulate complexity. Leverage the language. Abstract code into higher-level constructs.
So the code tends toward high-level, more abstract notions. In contrast, many optimizations require low-level control. So your simple, abstract representations will have to be broken.
When you optimize for low-level control, you set down a path that is hard to recover from without rewriting a lot of code. High-level languages let you encapsulate the change (through polymorphism, functional abstraction, etc) so that you don't have to rewrite everything. Of course you give up low-level control. You trade it for the ability to not have to think about it.
For instance, I would like to see a collections interface where you give it constraints and it gives you an implementation of the optimal collection. You can say "I need O(1) random read access" and it might give you a Vector. But if you say "I need O(1) push" it would give you a linked list. Abstraction abstraction abstraction.
> Have you been burned by
> Have you been burned by tail-call elimination?
Actually, yes, in rare circumstances, I have. :-)
> Perhaps you would want to control when it happens? Certainly it could be built as a compiler advice system.
Yes, absolutely. That's why I really like how CL's (declare) form works, particularly with optimization settings.
> Why implement one version and write rules to translate it? Because the first version is usually simpler. The rules can be used repeatedly. So I wind up with code that's smaller. Do Not Repeat Yourself.
To me, this is doing the exact opposite. Well-factored code should have only one instance of a concept's implementation. With good factoring, both in terms of functions and macros, you should find writing translation rules redundant.
> How willing are you to trade simple code for longer, more optimized code?
Extremely, if warranted. When rewriting code to use greater performance optimizations, the original code should remain available, as a drop-in replacement for the optimized code. E.g., you might have files like:
foo-knowngood.c
foo-optimized.c
In your makefile (or equivalent build system), selection of which foo to compile and link in ought to be a simple matter of setting an environment variable. If left unspecified, -knowngood is the default suffix used.
That way, the same set of unit tests can detect discrepencies between foo-knowngood and foo-optimized. It would additionally help if foo-optimized included comments indicating the logic behind the optimizations. Literate programming would come in handy here, since it is the tricky code that requires the greatest amount of documentation detail.
> Abstraction abstraction abstraction.
As a professional software engineer, I can also make the claim that overly abstract code is every bit as difficult to read as alien hieroglyphics. If it takes me longer to read a chunk of highly abstract code than it'd take me to just rewrite it from scratch at a more reasonable level of abstraction, I will. You can have spaghetti code with high-level languages just as easily as you can with low-level code.
Thanks for the response. Very interesting subject matter!
>> How willing are you to
>> How willing are you to trade simple code for longer, more optimized code?
>Extremely, if warranted. When rewriting code to use greater performance optimizations, the original code should remain available, as a drop-in replacement for the optimized code. E.g., you might have files like:
>foo-knowngood.c
>foo-optimized.c
Pardon me if I am naively misunderstanding you, but are you suggesting that we should maintain two separate code-bases? That seems tedious, but I'm not a professional software engineer, so maybe this is standard practice for you guys.
High Level Optimizations
I write a lot of c/c++ for work. I'm also doing some embedded firmware development for GPS as a hobby.
Most things I write I don't need to do low level optimizations. But where I cringe is when I implement somethings from a paper (most recent example fuzzy cmeans) .... the math describes it as O(n*c^2) n being samples c being classes. If you look carefully you can reduce it to O(n*c). This is more important than any low level optimization, other than I/O. I want to have to worry less if a sparse array would be better, and more about the algorithm design as a whole. I want a language where I see the forest and dig down for the trees.
I see lots of bad code in the name of being "efficient". Usually it's not even correct. I'd much rather trust the compiler than most programmers I've met.
Optimize Algorithm
I think we're all in agreement about that.
Naive, recursive Fibonacci is exponential in time and constant in space.
Memoized Fibonacci is linear in time in the worst case, but nearly constant in the case of recomputing a previous value. It's linear in space requirements.
Automatically iteratized Fibonacci is linear in time and constant in space.
The last two can be automatically created from the first. I want to be able to tell the compiler that I want it to replace my Fibonacci with one of the more efficient ones.
Advantages:
- My original implementation of Fibonacci is clear, simple, and minimal. It follows very closely the standard mathematical formulation. It remains clear.
- Only one more line of code to go from O(2^n) -> O(n). Little extra investment.
- I can switch between optimizations easily if I need to.
Disadvantages:
- Complexity of having to learn what optimizations are possible.
Note: You still have complete control over your code. You can choose not to optimize. You can always optimize it by hand.
Post new comment