Lisp Quote of the Day

Lisp is used by weirdos who do weirdo science.
– Richard Gabriel
Image above sourced from Wikimedia Commons.
Popularity: 5% [?]
Filed under Lisp Fun | Comment (1)Human-Order Sorting
Have you ever wanted to more intelligently sort strings that contain digits?
My motivation
Coding Horror had an article about what it calls “Natural” sort order. The article talks about sorting strings that contain digits. He calls it “natural” sort order, but I think that’s misleading (as I explain later). I think “human” ordering is the best way to describe it. In ASCII ordering, digits are just ordinary characters like “A” and “%”. They don’t mean anything special. But they are special to humans. When we see digits, we think numbers. And numbers have their own ordering.
When sorting character strings, you start at the beginning and keep comparing until the end.
You can stop comparing at the caret.
Redundant Rebound ^ b < d means Rebound < Redundant
You don’t care about the rest of the string, so length doesn’t always matter. If you did the same thing for integers, it would look like this.
989898832 9899 ^ 8 < 9 means 989898832 < 9899
That’s wrong. You can compare integers like strings, but they need special treatment. Here’s one way:
989898832 000009899 ^ 0 < 9 means 9899 < 989898832
But an easier way is to parse them as integers and compare them directly.
Background
The Python code you can find in the Coding Horror article is simple and succinct. Being one to favor Lisp over Python, I wanted to see what I could come up with in Common Lisp and what I would find on the way there. I will use my PCOND library. It will come in handy for stripping apart strings that contain digits and non-digits.
Introduction
The code you’ll read here is written to be read sequentially. I find that it makes it easier to follow the thread of development. Normally, however, you might want you tests separate from your code. Imagine, then, that this is all being typed at the REPL as you read it.
Code demonstration
Let’s start with some example test data:
CL-USER> (defvar list-of-strings (list “abc1″ “abc10″ “abc11″ “abc2″ “43bf” “hf34fd” “hf340fd” “ABC9″)) LIST-OF-STRINGS
- We have four strings that will need to be sorted according to numbers mixed with letters.
- We have a string that starts with a number.
- We have two strings with a number between letters.
- We have an uppercase string.
Normally, to sort strings, you’d do this (note: SORT is destructive, so we copy the list):
CL-USER> (sort (copy-list list-of-strings) #’string<) (”43bf” “ABC9″ “abc1″ “abc10″ “abc11″ “abc2″ “hf340fd” “hf34fd”)
This is obviously not the result we want. Let’s analyse it a bit.
The SORT function takes two required arguments: the sequence to sort, and an ordering predicate.
STRING< sorts strings by comparing characters in the string using CHAR<. This sort is case sensitive.
We can make it case insensitive by using STRING-LESSP:
CL-USER> (sort (copy-list list-of-strings) #’string-lessp) (”43bf” “abc1″ “abc10″ “abc11″ “abc2″ “ABC9″ “hf340fd” “hf34fd”)
This might be closer to what we want, depending on if we want our filenames sorted in a case-sensitive manner. But we still have not solved the number sorting issue.
Let’s take a closer look at SORT:
SORT takes a keyword parameter called :KEY that is applied to each element of the list to be sorted. The comparison predicate then uses the values returned by KEY to sort on instead of the values in the list.
So we need to find (or write) a combination of PRED and KEY that gives us the results we want.
Here is a LISP-UNIT test that defines exactly what we want:
CL-USER> (lisp-unit:define-test done-test
(lisp-unit:assert-equal
'(“43bf” “ABC9″ “abc1″ “abc2″ “abc10″
“abc11″ “hf34fd” “hf340fd”)
(sort (copy-list list-of-strings) #’alpha-num-list<
:key #’split-alpha-num)))
DONE-TEST
When DONE-TEST passes, we are done!
Our sorter needs to treat letter sequences differently from digit sequences. So let’s define a function that breaks a string up into a list of strings (non-digit sequences) and integers (parsed from digit sequences).
CL-USER> (lisp-unit:define-test split-alpha-num-test
(lisp-unit:assert-equal '("a" 1 “b” 2)
(split-alpha-num “a1b2″))
(lisp-unit:assert-equal ‘(”ab” 12 “de” 23)
(split-alpha-num “ab12de23″))
(lisp-unit:assert-equal ‘(1 “asa” 333 “ere.txt”)
(split-alpha-num “1asa333ere.txt”)))
SPLIT-ALPHA-NUM-TEST
Let’s write a recursive function, with three cases:
- string is empty => nil
- string starts with a digit sequence (1 or more) => return the integer represented by the sequence and the application of the function to the rest of the string
- string starts with a non-digit sequence (1 or more) => return the sequence and the application of the function to the rest of the string
CL-USER> (defun split-alpha-num (string) (pcond:pcond ((equal “” string) nil) ((:re “^(\\d+)(.*)” string ((#’parse-integer num) rest)) (cons num (split-alpha-num rest))) ((:re “^(\\D+)(.*)” string (alpha rest)) (cons alpha (split-alpha-num rest))))) SPLIT-ALPHA-NUM
Now our test:
CL-USER> (lisp-unit:run-tests split-alpha-num-test)
SPLIT-ALPHA-NUM-TEST: 3 assertions passed, 0 failed.
; No value
Great! It works.
We still need our function ALPHA-NUM-LIST<
Since it has two parameters, and each can be either nil or non-nil, we have four possibilities. The last possibility has sub-possibilities.
Output of
(alpha-num-list< a b)

This table represents the function we’d like to define.
Let’s write a test and develop each one separately.
CL-USER> (lisp-unit:define-test alpha-num-list<-test00 (lisp-unit:assert-false (alpha-num-list< nil nil))) ALPHA-NUM-LIST<-TEST00
CL-USER> (defun alpha-num-list< (a b) nil) ALPHA-NUM-LIST<
CL-USER> (lisp-unit:run-tests alpha-num-list<-test00)
ALPHA-NUM-LIST<-TEST00: 1 assertions passed, 0 failed.
; No value
One test down. Now another:
CL-USER> (lisp-unit:define-test alpha-num-list<-test01 (lisp-unit:assert-true (alpha-num-list< nil '(1))))) ALPHA-NUM-LIST<-TEST01
We’ll see if the test happens to pass.
CL-USER> (lisp-unit:run-tests
alpha-num-list<-test00
alpha-num-list<-test01)
ALPHA-NUM-LIST<-TEST00: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST01: (ALPHA-NUM-LIST< NIL (LIST ‘(1))) failed:
Expected T but saw NIL
ALPHA-NUM-LIST<-TEST01: 0 assertions passed, 1 failed.
TOTAL: 1 assertions passed, 1 failed, 0 execution errors.
; No value
It doesn’t. So we need to redefine our function.
CL-USER> (defun alpha-num-list< (a b) (if (null b) nil t)) STYLE-WARNING: redefining ALPHA-NUM-LIST< in DEFUN ALPHA-NUM-LIST<
And now to test it:
CL-USER> (lisp-unit:run-tests
alpha-num-list<-test00
alpha-num-list<-test01)
ALPHA-NUM-LIST<-TEST00: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST01: 1 assertions passed, 0 failed.
TOTAL: 2 assertions passed, 0 failed, 0 execution errors.
; No value
Two tests down . . .
CL-USER> (lisp-unit:define-test alpha-num-list<-test10 (lisp-unit:assert-false (alpha-num-list< '(1) nil))) ALPHA-NUM-LIST<-TEST10
CL-USER> (lisp-unit:run-tests
alpha-num-list<-test00
alpha-num-list<-test01
alpha-num-list<-test10)
ALPHA-NUM-LIST<-TEST00: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST01: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST10: 1 assertions passed, 0 failed.
TOTAL: 3 assertions passed, 0 failed, 0 execution errors.
; No value
The tests pass, so no need to change anything.
The last one is more complicated.
We’ve defined the output in the lower right corner of the table in terms of another function. That function is the string/number comparison predicate. I’ve called it ALPHA-NUM<
Here’s the specification of that function
(alpha-num< a b)

Basically, it’s all about the types. Numbers come before strings. We’re ordering case-sensitively.
I’ll use methods that specialize on the given types. They will be one line functions, so I won’t bother to write tests.
CL-USER> (defmethod alpha-num< ((a string) (b string)) (string< a b)) #<STANDARD-METHOD ALPHA-NUM< (STRING STRING) {B1218D9}>
CL-USER> (defmethod alpha-num< ((a number) (b string)) t) #<STANDARD-METHOD ALPHA-NUM< (NUMBER STRING) {B1740F1}>
CL-USER> (defmethod alpha-num< ((a string) (b number)) nil) #<STANDARD-METHOD ALPHA-NUM< (STRING NUMBER) {B1B4489}>
CL-USER> (defmethod alpha-num< ((a number) (b number)) (< a b)) #<STANDARD-METHOD ALPHA-NUM< (NUMBER NUMBER) {B202741}>
Notice how we rely on CLOS methods to perform the logic of our function. Very nice.
Let’s get back to our other function.
We can test the two constant squares on the bottom right very easily. Let’s hit those first.
CL-USER> (lisp-unit:define-test alpha-num-list<-testa<b
(lisp-unit:assert-true
(alpha-num-list< '(1) '("a"))))
ALPHA-NUM-LIST<-TESTA<B
CL-USER> (lisp-unit:run-tests
alpha-num-list<-test00
alpha-num-list<-test01
alpha-num-list<-test10
alpha-num-list<-testa<b)
ALPHA-NUM-LIST<-TEST00: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST01: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST10: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TESTA<B: 1 assertions passed, 0 failed.
TOTAL: 4 assertions passed, 0 failed, 0 execution errors.
; No value
CL-USER> (lisp-unit:define-test alpha-num-list<-testb<a
(lisp-unit:assert-false
(alpha-num-list< '("a") '(1))))
ALPHA-NUM-LIST<-TESTB<A
CL-USER> (lisp-unit:run-tests
alpha-num-list<-test00
alpha-num-list<-test01
alpha-num-list<-test10
alpha-num-list<-testa<b
alpha-num-list<-testb<a)
ALPHA-NUM-LIST<-TEST00: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST01: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST10: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TESTA<B: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TESTB<A: (ALPHA-NUM-LIST< '("a") '(1)) failed:
Expected NIL but saw T
ALPHA-NUM-LIST<-TESTB<A: 0 assertions passed, 1 failed.
TOTAL: 4 assertions passed, 1 failed, 0 execution errors.
; No value
Ah! A failure. Now we can write some code!
CL-USER> (defun alpha-num-list< (a b) (if (null b) nil (if (null a) t (if (alpha-num< (first a) (first b)) t (if (alpha-num< (first b) (first a)) nil))))) STYLE-WARNING: redefining ALPHA-NUM-LIST< in DEFUN ALPHA-NUM-LIST<
CL-USER> (lisp-unit:run-tests
alpha-num-list<-test00
alpha-num-list<-test01
alpha-num-list<-test10
alpha-num-list<-testa<b
alpha-num-list<-testb<a)
ALPHA-NUM-LIST<-TEST00: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST01: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST10: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TESTA<B: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TESTB<A: 1 assertions passed, 0 failed.
TOTAL: 5 assertions passed, 0 failed, 0 execution errors.
; No value
No failures! But are we done? Let’s ask the test.
CL-USER> (lisp-unit:run-tests done-test) DONE-TEST: (SORT (COPY-LIST LIST-OF-STRINGS) #’ALPHA-NUM-LIST< :KEY #’SPLIT-ALPHA-NUM) failed: Expected (“43bf” “ABC9″ “abc1″ “abc2″ “abc10″ “abc11″ “hf34fd” “hf340fd”) but saw (“43bf” “ABC9″ “abc1″ “abc10″ “abc11″ “abc2″ “hf34fd” “hf340fd”) DONE-TEST: 0 assertions passed, 1 failed. ; No value
It failed. Why? We never recurse.
CL-USER> (defun alpha-num-list< (a b) (if (null b) nil (if (null a) t (if (alpha-num< (first a) (first b)) t (if (alpha-num< (first b) (first a)) nil (alpha-num-list< (rest a) (rest b))))))) STYLE-WARNING: redefining ALPHA-NUM-LIST< in DEFUN ALPHA-NUM-LIST<
CL-USER> (lisp-unit:run-tests done-test)
DONE-TEST: 1 assertions passed, 0 failed.
; No value
We’ve got complete functionality. But the code for ALPHA-NUM-LIST< is ugly. Let’s change the nested IF’s to a COND.
CL-USER> (defun alpha-num-list< (a b) (cond ((null b) nil) ((null a) t) ((alpha-num< (first a) (first b)) t) ((alpha-num< (first b) (first a)) nil) (t (alpha-num-list< (rest a) (rest b))))) STYLE-WARNING: redefining ALPHA-NUM-LIST< in DEFUN ALPHA-NUM-LIST<
That’s a lot better.
CL-USER> (lisp-unit:run-tests
alpha-num-list<-test00
alpha-num-list<-test01
alpha-num-list<-test10
alpha-num-list<-testa<b
alpha-num-list<-testb<a
done-test)
ALPHA-NUM-LIST<-TEST00: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST01: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TEST10: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TESTA<B: 1 assertions passed, 0 failed.
ALPHA-NUM-LIST<-TESTB<A: 1 assertions passed, 0 failed.
DONE-TEST: 1 assertions passed, 0 failed.
TOTAL: 6 assertions passed, 0 failed, 0 execution errors.
; No value
So, now that we’ve got it working, let me propose this change:
CL-USER> (lisp-unit:define-test done-test
(lisp-unit:assert-equal
'(“43bf” “ABC9″ “abc1″ “abc2″ “abc10″
“abc11″ “hf34fd” “hf340fd”)
(sort (copy-list list-of-strings)
(make-list-comp #’alpha-num<)
:key #’split-alpha-num)))
DONE-TEST
We’ll modularize it so that we can use any comparison function.
CL-USER> (defun make-list-comp (fun) (labels ((f (a b) (cond ((null b) nil) ((null a) t) ((funcall fun (first a) (first b)) t) ((funcall fun (first b) (first a)) nil) (t (f (rest a) (rest b)))))) #’f))
CL-USER> (lisp-unit:run-tests done-test)
DONE-TEST: 1 assertions passed, 0 failed.
; No value
Let’s do the same for our function ALPHA-NUM<. We want to generate a new function, passing in the appropriate string comparison and number comparison function. We never wrote a test for it, so let’s do that now.
CL-USER> (lisp-unit:define-test alpha-num<-test (lisp-unit:assert-true (alpha-num< 1 “a”)) (lisp-unit:assert-false (alpha-num< “a” 1)) (lisp-unit:assert-true (alpha-num< 1 2)) (lisp-unit:assert-false (alpha-num< 2 1)) (lisp-unit:assert-true (alpha-num< “a” “b”)) (lisp-unit:assert-false (alpha-num< “b” “a”))) ALPHA-NUM<-TEST
CL-USER> (lisp-unit:run-tests alpha-num<-test)
ALPHA-NUM<-TEST: 6 assertions passed, 0 failed.
; No value
Now we need to combine it back into a function instead of four methods.
CL-USER> (defun alpha-num< (a b) (cond ((and (stringp a) (stringp b)) (string< a b)) ((and (numberp a) (stringp b)) t) ((and (stringp a) (numberp b)) nil) ((and (numberp a) (numberp b)) (< a b)))) STYLE-WARNING: redefining ALPHA-NUM< in DEFUN ALPHA-NUM< CL-USER> (lisp-unit:run-tests alpha-num<-test) ALPHA-NUM<-TEST: 6 assertions passed, 0 failed. ; No value
Now we can change it.
CL-USER> (lisp-unit:define-test alpha-num<-test (let ((fun (make-alpha-num-comp #’string< #’< t))) (lisp-unit:assert-true (funcall fun 1 “a”)) (lisp-unit:assert-false (funcall fun “a” 1)) (lisp-unit:assert-true (funcall fun 1 2)) (lisp-unit:assert-false (funcall fun 2 1)) (lisp-unit:assert-true (funcall fun “a” “b”)) (lisp-unit:assert-false (funcall fun “b” “a”)))) ALPHA-NUM<-TEST
Our test will fail now until we define the function.
CL-USER> (defun make-alpha-num-comp (string-comp num-comp nums-before-strings-p) #’(lambda (a b) (cond ((and (stringp a) (stringp b)) (funcall string-comp a b)) ((and (numberp a) (stringp b)) nums-before-strings-p) ((and (stringp a) (numberp b)) (not nums-before-strings-p)) ((and (numberp a) (numberp b)) (funcall num-comp a b))))) MAKE-ALPHA-NUM-COMP
CL-USER> (lisp-unit:run-tests alpha-num<-test)
ALPHA-NUM<-TEST: 6 assertions passed, 0 failed.
; No value
Ok, so what do we have? Not only do we have our predicates for doing the human sort, we also have a new language for combining string and number comparisons to create any combination we want.
Case-sensitive, numbers first, all ascending order.
CL-USER> (sort (copy-list list-of-strings) (make-list-comp (make-alpha-num-comp #’string< #’< t)) :key #’split-alpha-num) (”43bf” “ABC9″ “abc1″ “abc2″ “abc10″ “abc11″ “hf34fd” “hf340fd”)
Case-insensitive, numbers last, strings ascending, numbers descending
CL-USER> (sort (copy-list list-of-strings) (make-list-comp (make-alpha-num-comp #’string-lessp #’> nil)) :key #’split-alpha-num) (”abc11″ “abc10″ “ABC9″ “abc2″ “abc1″ “hf340fd” “hf34fd” “43bf”)
CL-USER> (lisp-unit:define-test done-test
(lisp-unit:assert-equal
'(“43bf” “ABC9″ “abc1″ “abc2″ “abc10″
“abc11″ “hf34fd” “hf340fd”)
(sort (copy-list list-of-strings)
(make-list-comp
(make-alpha-num-comp #’string< #’< t))
:key #’split-alpha-num)))
DONE-TEST
CL-USER> (lisp-unit:run-tests done-test)
DONE-TEST: 1 assertions passed, 0 failed.
; No value
CL-USER> (defun human-sort (list) (sort list (make-list-comp (make-alpha-num-comp #’string-lessp #’< t)) :key #’split-alpha-num)) HUMAN-SORT
Done.
Conclusion
Although the code here is not as short as the Python version (Python: 4 non-comment lines, CL: 37), it is clear and readable. This is a case where the Python standard libraries and semantics result in a smaller function definition — Python has a default ordering function for lists. I searched around the docs for one in Common Lisp. I didn’t find one, so I developed my own routines. And with just a little more effort, I have a great little library for making an infinite number of orderings for lists.
That’s what’s great about Lisp — the language might not have exactly what you need built in, but you can build it easily and while you’re doing it, you get all sorts of cool functionality.
One thing that I find newer languages (eg. Perl, Python) having is “magic” comparisons.
For instance, in Perl, from what I remember of the language, you can do this:
1 == "1"
That returns a true value. This is great for scraping text documents because you don’t have to convert strings to integers (and I believe the operator returns false when you can’t parse it as an integer). How’s that for a default? It seems to make sense in a lot of contexts. It’s kind of a poor man’s polymorphism. Perl’s defaults are often “human” in the same way the sort order we just developed is “human”.
However, defaults aren’t the answer. You can’t call an ordering “natural” because that implies that it is THE correct ordering, and that is false. Kent Pitman has a nice paper about expecting magic from your predicates. Its conclusion: that The Right Thing is arbitrary and application specific. Just because Python has a default ordering for lists doesn’t mean you’ll always want that ordering. It could even seem like the default ordering is wrong in a given context. Therefore, what’s important is the ability to quickly and powerfully create new predicates that fit your context. And Lisp is exceptional at that.
Well, that’s the end! As you mentally digest this post, ask yourself this question:
What does your experience with Common Lisp and other languages tell you about “magic” comparisons?
Then come back and share your thoughts!
Popularity: 3% [?]
Filed under Lisp Fun | Comments (11)