Regexes in Clojure

Summary: With a few functions from the standard library, Clojure lets you do most of what you want with regular expressions with no muss.

Clojure is designed to be hosted. Instead of defining a standard Regular Expression semantics that works on all platforms, Clojure defers to the host's semantics. On the JVM, you're using Java regexes. In ClojureScript, it's Javascript regexes. That's the first thing to know.

Other than the semantics of the regexes themselves, the API is standardized across all platforms in the core library. And the syntax is convenient because you don't need to double escape your special characters.

Literal representation

Regexes can be constructed in Clojure using a literal syntax. Strings with a hash in front are interpreted as regexes.

#"regex"

On the JVM, the above line will create an instance of java.util.regex.Pattern. In ClojureScript, it will create a RegExp. Remember, the two regular expression languages are similar but different.

Matching (with groups)

There is a nice function that matches the whole string. It is called re-matches. The return is a little complex. If the whole string does not match, it returns nil, which is nice because nil is falsey.

=> (re-matches #"abc" "zzzabcxxx")
   nil

If the string does match, and there are no groups (parens) in the regex, then it returns the matched string.

=> (re-matches #"abc" "abc")
   "abc"

If it matches but there are groups, then it returns a vector. The first element in the vector is the entire match. The remaining elements are the group matches.

=> (re-matches #"abc(.*)" "abcxyz")
   ["abcxyz" "xyz"]

The three different return types can get tricky, but in general I do have groups, so it's either a vector or nil, which is easy to handle. You can even destructure it before you test it.

(let [[_ fn ln] (re-matches #"(\w+)\s(\w+)" full-name)]
  (if fn ;; successful match
    (println fn ln)
    (println "Unparsable name")))

Matching substrings

re-matches matches the whole string. But often, we want to find a match within a string. re-find returns the first match within the string. The return values are similar to re-matches.

No match returns nil

=> (re-find #"sss" "Loch Ness")
nil

Match without groups returns matched string

=> (re-find #"s+" "dress")
"ss"

Match with groups returns a vector

=> (re-find #"s+(.*)(s+)" "success")
   ["success" "ucces" "s"]

Finding all substrings that match

The last function from clojure.core I use a lot is re-seq, which returns a lazy seq of all of the matches, not just the first. The elements of the seq are whatever type re-find would have returned.

=> (re-seq #"s+" "mississippi")
   ("ss" "ss")

Replacing regex matches within a string

Well, matching strings is cool, but often you'd like to replace a substring that matches with some other string. clojure.string/replace will replace all substring matches with a new string. Let's take a look:

=> (clojure.string/replace "mississippi" #"i.." "obb")
   "mobbobbobbi"

This function is actually quite versatile. You can refer directly to the groups in the replacement string:

=> (clojure.string/replace "mississippi" #"(i)" "$1$1")
   "miissiissiippii"

You can also replace with the value of a function applied to the match:

=> (clojure.string/replace "mississippi" #"(.)i(.)"
     (fn [[_ b a]]
       (str (clojure.string/upper-case b)
            "--"
            (clojure.string/upper-case a))))
   "M--SS--SS--Ppi"

You can replace just the first occurence with clojure.string/replace-first.

Splitting a string on a regex

Let's say you want to split a string on some character pattern, like one or more whitespace. You can use clojure.string/split:

=> (clojure.string/split "This is a string    that I am splitting." #"\s+")
   ["This" "is" "a" "string" "that" "I" "am" "splitting."]

Nice!

Other functions

Those are all of the functions I use routinely. There are some more, which are useful when you need them.

re-pattern

Construct a regex from a String.

re-matcher

This one is not available in ClojureScript. On the JVM, it creates a java.util.regex.Matcher, which is used for iterating over subsequent matches. This is not so useful since re-seq exists.

If you find yourself with a Matcher, you can call re-find on it to get the next match (instead of the first). You can also call re-groups from the most recent match. Unless you need a Matcher for some Java API, just stick to re-seq.

Conclusion

Well, that's regexes as I use them. They're super useful and easy to use in Clojure once you get the hang of them.

If you're interested in learning the fundamentals of Clojure, may I suggest my own LispCast Introduction to Clojure video series. It guides you through a deep experience of the language. You'll learn REPL skills, how to set up a project, and how to develop a DSL, all in a fun, interactive way.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like

Warty Lists in Clojure

Summary: Lists are kind of warty in Clojure. Care should be taken, especially by those coming from other Lisps.

One of the things that still trips me up in Clojure is the actual types of lists. I used to program in Common Lisp, where things are a bit easier to understand: something is either a list or an atom. Lists are built out of conses and end with nil. Everything else is an atom.

Coming from Common Lisp, one might expect this to work:

=> (listp (cons 1 nil))

In CL, it will be true. In Clojure, it is also true. (Try it out yourself!) Launch a REPL. Go to Try Clojure.

=> (list? (cons 1 nil))

What about this:

=> (list? (cons 1 (cons 1 nil)))

We're consing onto a list, should be a list, no?

In Common Lisp, yes. In Clojure?

NO!

That's not a list. Go ahead, try it at your Clojure REPL.

What gives? How is that possible? Are we living in a Kafkaesque ECMAScript World?

Well . . . probably.

What's going on? Put on your Indiana Jones fedora. We're going on an adventure deep into the heart of the Clojure JVM implementation.

First, how is list? defined?

In clojure.core:

(defn list?
  "Returns true if x implements IPersistentList"
  {:added "1.0"}
  [x] (instance? clojure.lang.IPersistentList x))

Great, list? is just an instance check. What classes implement clojure.lang.IPersistentList? According to the javadoc on the Clojure sources, there are two: PersistentQueue and PersistentList. There's also a static inner class in PersistentList called EmptyList.

Let's see those at the REPL:

=> (type ())
  clojure.lang.PersistentList$EmptyList
=> (list? ())
  true

=> (type (list 1 2 3))
  clojure.lang.PersistentList
=> (list? (list 1 2 3))
  true

=> (type clojure.lang.PersistentQueue/EMPTY)
  clojure.lang.PersistentQueue
=> (list? clojure.lang.PersistentQueue/EMPTY)
  true

These all return true when given to list?. What about (cons 1 nil)?

=> (type (cons 1 nil))
  clojure.lang.PersistentList
=> (list? (cons 1 nil))
  true

Great. Consing onto nil gives you a list. Let's cons onto a list:

=> (type (cons 0 (list 1 2 3)))
  clojure.lang.Cons
=> (list? (cons 0 (list 1 2 3)))
  false

Oh, no! Why does consing onto nil create a list, but consing onto a list create a cons? Poisonous dart averted! Back to the source code!

In clojure.core:

(def
 ^{:arglists '([x seq])
    :doc "Returns a new seq where x is the first element and seq is
    the rest."
   :added "1.0"}

 cons (fn* cons [x seq] (. clojure.lang.RT (cons x seq))))

So it calls clojure.lang.RT/cons. We can look that up:

static public ISeq cons(Object x, Object coll){
    //ISeq y = seq(coll);
    if(coll == null)
        return new PersistentList(x);
    else if(coll instanceof ISeq)
        return new Cons(x, (ISeq) coll);
    else
        return new Cons(x, seq(coll));
}

Wow! The code is clear: if the second argument is null (nil), it makes a PersistentList. Otherwise, it constructs a clojure.lang.Cons, which is not a list! We're at the root of our wart, but there's nothing we can really do about it except keep exploring.

If I have a list (according to list?) and I want to add an element to the front to make a new list, how do I do that?

Well, the answer is a little disappointing:

=> (def ls (list 1 2 3))
=> (def ls2 (conj ls 0))
=> (list? ls2)
  true

conj will maintain the type, cons will not. What happens if I conj onto a Cons?

=> (def c (cons 1 (cons 2 nil)))
=> (conj c 0)
  (0 1 2)
=> (type conj c 0)
  clojure.lang.Cons

So, there you have it. It's a bit of a wart having all of these slightly different types and predicates like list? that slice them up in odd ways. Sometimes it's like running away from a giant paper-mache ball.

In Clojure's defense, I will say that I rarely use list?, if at all. It's not a very useful function in clojure.core. I'm usually working at a much higher level than that, thinking in terms of sequences (an abstraction), not their concrete implementations. And in the end, you never get out of danger.

There you have it. If you'd like to learn more Clojure, I have a nice video series:

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like

CSS and the Lambda Calculus

Summary: Using LESS, we can almost achieve the expressive power of the Lambda Calculus as applied to styling. The expressive power is enough to create reusable styles applied to reusable HTML components.

Let's continue our exploration and analysis of CSS and LESS. This series is about Functional CSS. Our aim is to determine a good way to use HTML and CSS so that we can reuse both. We can't talk about "Functional" without talking about the Lambda Calculus.

The Lambda Calculus includes three things: variables, abstractions, and applications. Let's look at some Javascript.

(function(x) { return x + x; })(10);

In the above code, x is a variable, the function definition is a lambda abstraction, and calling the function (with the parens at the end) is called application. Luckily, Javascript gives us an additional kind of abstraction where we can name an expression to be reused:

var f = function(x) { return x + x; };
f(10);

We can name the function f, then apply it by referring to the name. What's more, we can compose them pretty well.

var g = function(x) { return x * x; };
var h = function(x, y) { return g(x) + g(y); };
h(10, 20);

We're composing function g by applying it inside of the definition of h. You're probably saying "duh!"--and rightly so. It's so common to do.

Which is why it's hard to understand why CSS does not include all of these parts. Let's at least try to decompose CSS into some parts.

.some-class { width: 10px; height: 20px; }

Here, we're applying the rule which contains two properties (width and height) to all of the elements that have the class .some-class. .some-class is the argument to the rule's application. In fake Javascript, it would look something like this1:

function(element) {
  element.width  = 10;
  element.height = 20;
}(document.getElementsByClassName('some-class'));

That is, we are immediately applying the abstraction to the argument. CSS has no way, as Javascript does, of naming the abstraction for use later. We definitely want that. What's more, there's no way to compose two rules together. In Javascript, we could refer to g within h. But in CSS, there's no way to do that.

LESS does have this ability. Since LESS is a superset of CSS, we can start with the above CSS code. Let's repeat it:

.some-class { width: 10px; height: 20px; }

Now we abstract it by naming it and then apply the name:

.small-box() { width: 10px; height: 20px; }
.some-class { .small-box(); }

That's somewhat better. We've got a reusable component (.small-box) and we've also added a bit more meaning to our code because we have a meaningful name. What's more, LESS lets you compose:

.small-box() { width: 10px; height: 20px; }
.my-button() { .small-box(); border: 1px solid black; }

.some-class { .my-button(); }

Here we've made an abstraction called .my-button by using .small-box inside. That's composition. And that's on par with our Javascript examples above.

There's one more thing that we might want that even Javascript doesn't have. Javascript has first-class functions. That's really nice. But there are many things in Javascript that are not first-class. For instance, the arithmetic operators (+, *, -, /) cannot be passed to a function or assigned to variables. They are treated differently by the language. They are a bit like properties in CSS: all you can do with them is refer to them directly in an expression.

But what if Javascript did have first-class arithmetic functions? We can certainly fake them out.

var plus  = function(a, b) { return a + b; };
var times = function(a, b) { return a * b; };
var minus = function(a, b) { return a - b; };
var over  = function(a, b) { return a / b; };

Now we have functions that act just like the operators--and they're first-class. It's also way more consistent: every operator is a function.

We can do something similar in LESS. Imagine we took every property and did what I do to these:

.width(@w)  { width:  @w; }
.height(@h) { height: @h; }
. . .

Now we have the consistency that everything is at least referrable as a mixin. This is pretty good.

We can now define our styles--and name them--in this subset of LESS. Styles may reference other styles. And rules with selectors may reference styles. You could import pure styles from an external library, then refer to them in your rules, where you apply them to selectors. The HTML does not have to change, and neither do the styles.

Rules--where styles and selectors are tied together--will change the most. As common sets of styles are used together often, you might think about factoring them out into a new style, which would be added to your organization's standard style library. Also, as HTML components solidify, you could begin to firm up the class names and their structure, reusing them in a more permanent way. Yet, even though these two assets (the standard styles and the standard class names / HTML structures) are permanent, you can still change how they are styled by changing the rules. Both assets retain their value over time.

A fly in the ointment

Even though we have tremendous power over that given by CSS, we are not at the level of first-class styles. The following will NOT work in LESS, though we would expect it to.

/* refer to variable like a mixin */
.apply(@style, @arg) { @style(@arg); }

.width(@w) { width: @w; }

/* pass in .width mixin */
#xyz { .apply(.width, 10px); }

The problem is that you cannot use a mixin named with a variable. The same limitation exists in SCSS (SASS). The following is invalid.

@mixin apply($s, $a) { @include $s($a); }

Why can't a mixin be assigned to a variable? Mixins appear to be in a different namespace from variables, and only mixin syntax can appear in the mixin position. Though I don't think apply itself would be very useful, it would indicate a recursive abstraction power that could be used well.

Not being able to write apply hints that the developers of these two languages are thinking at the syntax level. They have added some great features, but they have not truly modeled the problem semantically. Instead, it feels like a mix of special-cased syntax rules and string interpolation. You can assign a ruleset to a variable, but not a mixin?

Conclusion

LESS (and SASS, etc) have some very powerful features that are miles above CSS in terms of abstraction and composition. I hate using plain CSS after using LESS--LESS is just so much more expressive. You can do some impressive and useful things with them, and I continue to use them. I want to write about how I use a subset of LESS and a strict discipline in HTML to make styling easier and more maintainable. In addition, I'd like to explore what a better-designed style language might look like.

For more inspiration, history, interviews, and trends of interest to Clojure programmers, get the free Clojure Gazette.

Learn More

Clojure pulls in ideas from many different languages and paradigms, and also from the broader world, including music and philosophy. The Clojure Gazette shares that vision and weaves a rich tapestry of ideas from the daily flow of library releases to the deep historical roots of computer science.


  1. Please ignore the fact that CSS does operate on sets of elements, not simple elements.

Separation of Presentation and Content

Summary: One reason to separate style from content is to reuse HTML or CSS. Ultimately, we would like a solution where we can reuse both.

Reusable Content

There is an economic reason to separate presentation from content. Publishers have thousands of pages of HTML on their site, yet they want to enhance the style of their pages over time. It would cost a lot of money to change every single page to match their new style. So they invest a little more time writing each page so that the HTML markup does not refer to styles but to the semantics of the content (referred to as semantic HTML). Then they hire a designer to write CSS to make their existing content look new. The HTML is permanent and reusable, and the CSS is temporary and not-reusable. The separation is only one way: the HTML doesn't know the CSS, but the CSS does know the HTML.

Examples: CSS Zen Garden, newspaper websites, blogs

Characteristics: Semantic markup, CSS tailored to classes/structure of HTML

Reusable Styles

Yet another economic reason is a relatively newer phenomenon. It has become very easy to create a new web site/application. Writing (or generating) lots of HTML is cheap, and it changes often during iterative development. What is relatively expensive is to design each of those pages each time the pages change. CSS is not good at adapting to page structure changes. So people have built CSS frameworks where the CSS is (relatively) permanent and the HTML is temporary. In these cases, the HTML knows the CSS, but the CSS doesn't know the HTML. The separation is again one way--this time the other way.

Examples: Open Source CSS, Bootstrap, Foundation, Pure

Characteristics: HTML tailored to classes/structure of CSS, Reusable CSS

Reusable Content and Styles

What if a newspaper site, with millions of existing HTML pages, could cheaply take advantage of the reusable styles of frameworks like Bootstrap? That is the Holy Grail of separation of concerns. What would be required to do that?

What we really want is a two-way separation. We want HTML written in total isolation and CSS written in total isolation. We want permanent HTML and permanent CSS. How can the style and content, each developed separately, finally be brought together? The answer is simple: a third document to relate the two.

We have already seen that CSS is not good at abstraction. CSS cannot name a style to use it later. However, LESS does have powerful forms of abstraction. LESS has the ability to define reusable styles and apply them to HTML that did not have those styles in mind. If you put the definition of reusable styles in one document and the application of those styles in another document, you achieve true separation. And it is already happening a little bit. You can do it in your own code.

It is a bit like a software library. We put the reusable bits in the library, and their specific use in the app.

Examples: Compass, Semantic Grid System

Characteristics: Semantic markup, Reuseable Styles, Tie-in document to relate Style to Content

Conclusion

CSS preprocessors, which began as convenience tools, are actually powerful enough to solve fundamental problems with HTML and CSS. While it is still early, LESS and other CSS preprocessors, if harnessed correctly, could dramatically transform how we build and design web sites. Typography, grids and layout, and other design concerns can be used as plugable libraries. And other languages that are specifically designed to do that may emerge. What would a systematic, analytical approach to such an approach look like?

For more inspiration, history, interviews, and trends of interest to Clojure programmers, get the free Clojure Gazette.

Learn More

Clojure pulls in ideas from many different languages and paradigms, and also from the broader world, including music and philosophy. The Clojure Gazette shares that vision and weaves a rich tapestry of ideas from the daily flow of library releases to the deep historical roots of computer science.

You might also like

Clojure Web Security

Summary: Use the OWASP Top Ten Project to minimize security vulnerabilities in your Clojure web application.

Aaron Bedra gave a very damning talk about the security of Clojure web applications. He went so far as to say that Clojure web apps are some of the worst he has seen. You should watch the talk. He has some good recommendations.

One of the jobs of web frameworks is to handle security concerns inherent in the web itself. Because most Clojure programmers build their own web stack, they often fail to look at the security implications of their application. They do not protect their site from even the easiest and most common forms of vulnerabilities. These vulnerabilities are problems with the way the web works, not with the particular server technology, yet it has become the server's responsibility to mitigate the vulnerabilities. Luckily, the vulnerabilities are well-studied and there are known fixes.

The Open Web Application Security Project (OWASP) does a very good job of documenting common web vulnerabilities and providing good fixes for them. They have a project called the Top Ten Project which every web developer should refer to regularly and use to improve the security of their app. You should also run through the Application Security Verification Standard checklists to audit your code. But the Top Ten should get you to understand the basics.

Warning: I am not a security expert. You should do your own research. The code I present here is my own interpretation of the OWASP recommendations. It has not been audited by experts. Do your own research!

Also, security is an ongoing concern. If you have any comments, suggestions, or questions, please bring them up!

Here is the Top Ten 2013 with a small breakdown and a Clojure solution, if applicable.

A1. Injection

If a server accepts input from the outside and then parses and interprets that input as a scripting or query language, it is open to attack. The most common form is SQL Injection, where an input form is posted to the server, the value of that form is concatenated into a string to make a SQL statement, and then the SQL statement is sent to the database to be executed. What happens if a malicious user types in "'; DELETE FROM USERS;"?

My preferred solution to SQL Injection in Clojure is to always use parameterized SQL statements. clojure.java.jdbc, supports these directly. The parameters will be escaped, making injection impossible.

Another problem is if you want to read in some Clojure data from the client, and you call clojure.core/read-string on it. read-string will execute arbitrary Java constructors. For instance:

#java.io.FileWriter["myfile.txt"]

This will create the file myfile.txt or overwrite if it already exists. Also, there is a form (called read-eval form) to execute code at read-time:

#=(println "Hello, vulnerability!")

Read in that string, and it will print. Any code could be in there.

The solution is to never use clojure.core/read-string. Use clojure.edn/read-string, which is a well-documented format. It does not run arbitrary constructors. It has no read-eval forms.

Summary: Always use parameterized SQL and use clojure.edn/read-string instead of clojure.core/read-string on edn input.

A2 Broken Authentication and Session Management

Authentication

This is a big topic and I can't address it all here. Clojure has the Friend library, which is the closest thing we have to a de facto standard. My suggestion is simply to read the entire Friend README and evaluate whether you should use it. This is serious stuff. Read it.

Session Management

Ring provides a session system which is fairly good. It meets many of the OWASP Application Security Verification Standard V3 requirements. But it does not handle all of them automatically. You still need code audits. For instance, if you are logging requests, OWASP recommends against logging the session key. You must ensure that the session key is added after the request is logged.

The ASVS also recommends expiring your sessions after inactivity and also after a fixed period, regardless of activity. Ring sessions do not do this automatically (the builtin mechanism has no notion of expiration) and the default implementations of session stores will store and accept sessions indefinitely. A simple middleware will do the trick of expiring them in both cases:

(defn wrap-expire-sessions [hdlr & [{:keys [inactive-timeout
                                            hard-timeout]
                                     :or {:inactive-timeout (* 1000 60 15)
                                          :hard-timeout (* 1000 60 60 2)}}]]
  (fn [req]
    (let [now (System/currentTimeMillis)
          session (:session req)
          session-key (:session/key req)]
      (if session-key ;; there is a session
        (let [{:keys [last-activity session-created]} session]
          (if (and last-activity
                   (< (- now last-activity) inactive-timeout)
                   session-created
                   (< (- now session-created) hard-timeout))
            (let [resp (hdlr req)]
              (if (:session resp)
                (-> resp
                    (assoc-in [:session :last-activity] now)
                    (assoc-in [:session :session-created] session-created))
                resp))
            ;; expired session
            ;; block request and delete session
            {:body "Your session has expired."
             :status 401
             :headers {}
             :session nil}))
        ;; no session, just call the handler
        ;; assume friend or other system will handle it
        (hdlr req)))))

Set the HttpOnly attribute on the session cookie. Very important for preventing stealing of session ids from XSS attacks.

Do not set the Domain attribute, and do set the Path if you want something more restrictive than / (the Ring session default).

Do not set the Expire and Max-Age attributes. Setting them makes the browser store the session id on disk, which simply expands the number of ways an attacker can get ahold of it.

Change the session cookie name to something utterly generic, like "id". You don't want to leak more information than necessary about how your sessions work.

Use HTTPS if you can and set the Secure attribute of the cookie.

Do not use in-cookie sessions. In-memory are good but they can't scale past one machine. carmine has a redis-based session implementation.

Summary: Here's how I use Ring sessions (with carmine) based on these OWASP recommendations.

(session/wrap-session
   (wrap-expire-sessions
    handler
    {:inactive-timeout 500
     :hard-timeout 3000})
   {:cookie-name "id"
    :store (taoensso.carmine.ring/carmine-store redis-db
             {:expiration-secs (* 60 60 15)
             :key-prefix ""}) ;; leak nothing!
    :cookie-attrs {:secure true :httponly true}})

A3 Cross-Site Scripting (XSS)

Whenever text from one user is shown to another user, there is the potential for injecting code (HTML, JS, or CSS) that is run in the victim's browser. Imagine if Facebook allowed any HTML in the post submission form. A malicious user could add a <script> tag with some keystroke logging code. Anybody who viewed that post in their feed would also get the key logger installed. That would be bad.

XSS is common because of how easy it is to make an app that stores user input (from a form post) in a database, then constructs the page out of stuff from the database. If you're not extremely careful, you could create a place where people can exploit each other.

The solution is to only use scrubbed or escaped values to build HTML pages. Because HTML pages can include different languages (HTML, CSS, JS), text needs to be scrubbed differently in each context. OWASP has a set of rules to follow which will guarantee XSS prevention.

hiccup.util/escape-html (also aliased as hiccup.core/h) will escape all dangerous HTML characters into HTML entities. JS and CSS still need to be handled, and rules for HTML attributes need to be followed.

If you want to allow some HTML elements, you will need to do a complex scrub. Luckily, Google has a nice Java library that sanitizes HTML. Use it.

Summary: Validate and scrub input from the user and scrub/escape text on output.

A4 Insecure Direct Object References

This one is a biggie: each handler has to do authentication. Does the particular logged in user have access to the resources requested? There's no way to automate this with a middleware. But having some system is better than doing it ad hoc each time. Remember: an attacker can construct any URL, including URLs with a database key in it. Don't assume that just because a request contains a key, the user must have the rights to it.

Summary: Always check the authority of the requesting session before performing an action.

A5 Security Misconfiguration

This is about keeping your software up to date and making sure the settings of all software makes sense.

A6 Sensitive Data Exposure

Having data is risky. Don't let it leak out.

A7 Missing Function Level Access Control

Use an authorization system (Friend) and audit the roles used for access control.

A8 Cross-Site Request Forgery (CSRF)

Let's imagine you have a bank account at Bank of Merica. You just checked your balance and didn't log out. Then you go to some public forum, where someone has posted a cool file. There's a big download button. You click it, and the next thing you know, you're on your bank page and all of your money has been transfered out of your account.

What happened?

The download button said "Download" but it was really a form submit button. The form had hidden fields "to-account", and "amount". The action of the form was "http://www.bankofmerica.com/transfer-money". By clicking that button, the form was posted to the bank, and because you were just logged in, oops, it transfered all your money away.

The solution is that you only want to accept form posts that come directly from your site, which you control. You don't want some random person to convince people to click on other sites to be able to transfer people's money like that.

There are several possible solutions. One approach is to add a secret to the session and also insert that secret into every form. That is the approach taken by the ring-anti-forgery library.

The solution that I like is to do a double-submit. This means you submit a secret token in the cookie (sent with each web request) and in a hidden field in the form. The server confirms that the cookie and the hidden field match. But the hidden field in the form is added by a small Javascript script which reads it from the cookie. Browsers don't allow Javascript to read cookies from other sites, so you guarantee that they form was posted from your site.

There are three parts to the solution.

  1. Install a secret token as a cookie.
  2. Install a script to add the hidden field to all forms.
  3. Check that the field matches the cookie on POSTs.

Here is some code to do 1 and 3.

(defn is-form-post? [req]
  (and (= :post (:request-method req))
       (let [ct (get-in req [:headers "content-type"])]
         (or (= "application/x-www-form-urlencoded" ct)
             (= "multipart/form-data" ct)))))

(defn csrf-tokens-match? [req]
  (let [cookie-token (get-in req [:cookies "csrf"])
        post-token   (get-in req [:form-params "csrf"])]
    (= cookie-token post-token)))

(defn wrap-csrf-cookie [hdlr]
  (fn [req]
    (let [cookie (get-in req [:cookies "csrf"]
                         (str (java.util.UUID/randomUUID)))]
      (assoc-in (hdlr req) [:cookies "csrf"] cookie))))

(defn wrap-check-csrf [hdlr]
  (fn [req]
    (if (is-form-post? req)
      (if (csrf-tokens-match? req)
        ;; we're safe
        (hdlr req)
        ;; possible attack
        {:body "CSRF tokens don't match."
         :status 400
         :headers {}})
      ;; we don't check other requests
      (hdlr req))))

The Javascript should be something like this:

(def csrf-script "(function() {
  var cookies = document.cookie;
  var matches = cookies.match(/csrf=([^;]*);/);
  var token   = matches[1];
  $('form').each(function(i, form) {
    if(form.attr('method').toLowerCase() === 'post') {
      var hidden = $('<input />');
      hidden.attr('type', 'hidden');
      hidden.attr('name', 'csrf');
      hidden.attr('value', token);
      form.append(hidden);
    }
  })
}());")

You should add it to all HTML pages. Note that this example script requires jQuery. Put it right before the </body>.

[:script csrf-script]

The nice thing about this solution is that it is strict by default. If you don't include the script, form posts won't work (assuming wrap-check-csrf is in your middleware stack).

Summary: CSRF attacks take advantage of properties of the browser (instead of properties of your server), so their defense can largely be automated.

A9 Using Components with Known Vulnerabilities

Software with known vulnerabilities is easily attacked using scripts. You should ensure that all of your software is up-to-date.

A10 Unvalidated Redirects and Forwards

One common pattern for login workflow is to have a query parameter that contains the url to redirect to. Since it's a user parameter, it's open to the world and could be a doorway for attackers.

For example, let's say someone sends an email to someone asking them to log in to their bank account. In it, there's this link:

http://www.bankofmerica.com/login?redirect=http://attackersite.com

What happens when they click? They see the legitimate site of their bank, which they trust. But it redirects them to the attacker's site, which has been designed to look like the bank site. The user might miss this change of domains and unwittingly reveal private information.

What can you do?

OWASP recommends never performing redirects, which is impractical. The next best thing is to never base the redirect on a user parameter. This would work, but puts a lot of trust in the developers and security auditors to check that the policy is enforced. My preferred solution allows redirects that conform to a whitelist of patterns.

(def redirect-whitelist
  [#"https://www.bankofmerica.com/" ;; homepage
   #"https://www.bankofmerica.com/account" ;; account page
   ...
  ])

(defn wrap-authorized-redirects [hdlr]
  (fn [req]
    (let [resp (hdlr req)
          loc (get-in resp [:headers "Location"])]
      (if loc
        (if (some #(re-matches % loc) redirect-whitelist)
          ;; redirect on our whitelist, it's ok!
          resp
          ;; possible attack
          (do
            ;; log it
            (warning "Possible redirect attack: " loc)
            ;; change redirect back to home page
            (assoc-in resp [:headers "Location"] "https://www.bankofmerica.com/")))
        resp))))

Summary: Redirect attacks can largely be avoided by checking the redirect URL against a whitelist.

Conclusion

Web security is hard. It takes education and vigilance to keep our servers secure. Luckily, the main security flaws of the web are well-understood and well-documented. However, this is only half of the work. These need to be translated into Clojure either as libraries and simply as "best practices". Further, these libraries and practices need to be discussed and kept top-of-mind.

If programming the web in Clojure interests you, you might be interested in my Web Development in Clojure video series. It covers all of the basics of web development, building a foundation to understand the entire Clojure web stack.

You might also like

A Ring Spec to Hang on the Wall

Summary: The Ring SPEC is the core of the Clojure web ecosystem. The standard is small and a reference is handy.

If you program the web in Clojure, you probably use Ring. Even if you don't, your server is likely Ring compatible.

Ring has a small SPEC. It's centered around defining the keys one can expect in the request and response maps. And the exact names for keywords are easy to forget.

I don't want to forget. I use Ring often enough that I want a quick reference. A while ago, I printed out a quick summary of the keys for the request and response maps and hung it on the wall behind my monitor. I refer to it frequently.

If you program the web in Clojure, you might appreciate this printout. If you're learning, it could be an invaluable reference. Please download the PDF.

And if you like it, you might also like my Web Development in Clojure video series.

You might also like

What Web Framework Should I Use in Clojure?

Summary: There are a number of web frameworks in Clojure, but beginners should roll their own server stack themselves to tap into the Ring ecosystem.

One question that I am asked a lot by beginners in Clojure is "What web framework should I use?" This is a good question. In Python, there's Django. In PHP, Drupal. And of course in Ruby, there's the king of all web frameworks, Ruby on Rails.

What framework should you use in Clojure? The question is actually kind of hard to answer. There are a number of web frameworks out there. Some people call Compojure a framework, though it is really a library. lib-noir does a lot of work for you. Then there's your true frameworks, like Pedestal or Hoplon, which provide infrastructure and abstractions for tackling web development. All of these projects are great, but for a beginner, I have to recommend building your own web stack starting with Ring.

Compojure is really just a routing library, not a framework. You can use it for your routing needs, though there are alternatives, such as playnice, bidi, Route One, and gudu. If you don't want to decide, use Compojure. It's widely used and works great. If you want to go in depth, read the docs for the others. They are each good for different cases.

lib-noir comes from Noir, which was a web framework (now deprecated). It was easy and provided a bit of plumbing already built for you, so you could just start a project with a lot of the infrastructure built in. lib-noir is that infrastructure in library form. I haven't used it, but a lot of people like it. However, when I look at it, I see that most of what it provides I either won't use or it is trivial to add myself. That would normally be ok if there was huge adoption for it (like with Rails) so you get an ecosystem effect, but there isn't. lib-noir is used but certainly not dominant.

Pedestal has a lot of backing. It aims to tackle single-page apps by providing a sane front-end environment using ClojureScript in the form of a message queue. If you're into "real-time apps", this may be for you. Though, I would caution you that it's not for a Clojure beginner. Pedestal introduces a lot of new concepts that even experienced Clojure programmers have to learn. The tutorial is long and arduous. You will have problems learning Pedestal without knowing Clojure.

Update: Pedestal has changed dramatically since I last looked at it. It is no longer a frontend framework. It is now a high-performance backend server for high performance asynchronous processing. It's worth looking at if you need that. Otherwise, I stick with my recommendation to use basic Ring.

Hoplon is also designed for web apps. It gives you a DOM written in ClojureScript (including custom components), dataflow programming (like a spreadsheet), and client-server communication. It's a bold step, but again, requires you to buy into programming models that will take a long time to understand. If you are not already familiar with Clojure, you are asking for trouble.

There are other frameworks out there. But I recommend rolling your own stack. If you're learning Clojure, the best way to grasp how web apps work is to get a Ring Jetty adapter set up with some basic handlers. Add existing middleware as you need it. Write some middleware of your own. Use Compojure to route. Use Hiccup to generate HTML. That setup will get you a long way.

Ring is just functions. With a few basic concepts and a copy of the Ring SPEC handy, you can build a web server very quickly that does exactly what you want and you can understand every aspect of it. The experience of building one yourself can teach you a lot about how the other frameworks are put together.

What's more, Ring is dominant. Most people write functionality (in the form of middleware and handlers) assuming Ring and no more. So by staying close to the metal, you are tapping into a huge resevoir of pre-written libraries that are all compatible with each other. Ring is the locus for the Clojure web ecosystem.

Wiring up your own middleware stack is not that daunting. If you want guidance, my Web Development in Clojure video series is now for sale. It starts with a brand new Clojure project and ends with a fully functional app, backed by a database, and hosted on Heroku (for free!). In one hour, it explains all the concepts and shows you lots of examples. There's lots of exercises to get your brain whirring.

Recommended stack:

Then keep adding and customizing.

You might also like

When To Use a Macro in Clojure

Summary: Macros should be avoided to the extent possible. There are three circumstances where they are required.

There's a common theme in Lisp that you should only use macros when you need them. It is very common to see a new lisper overuse macros. I did it myself when I first learned Lisp. They are very powerful and make you the king of syntax.

Clojure macros do have their uses, but why should you avoid them if possible? The principle reason is that macros are not first-class in Clojure. You cannot access them at runtime. You cannot pass them as an argument to a function, nor do any of the other powerful stuff you've come to love from functional programming. In short, macros are not functional programming (though they can make use of it).

A function, on the other hand, is a first-class value, and so is available for awesome functional programming constructs. You should prefer functions to macros.

That said, macros are still useful because there are things macros can do that functions cannot. What are the powers of a macro that are unavailable to any other construct in Clojure? If you need any of these abilities, write a macro.

1. The code has to run at compile time

There are just some things that need to happen at compile time. I recently wrote a macro that returns the hash of the current git commit so that the hash can be embedded in the ClojureScript compilation. This needs to be done at compile time because the script will be run somewhere else, where it cannot get the commit hash. Another example is performing expensive calculations at compile time as an optimization.

Example:

(defmacro build-time []
  (str (java.util.Date.)))

The build-time macro returns a String representation of the time it is run.

Running code at compile time is not possible in anything other than macros.

2. You need access to unevaled arguments

Macros are useful for writing new, convenient syntactic constructs. And when we talk about syntax, we are typically talking about raw, unevaluated sexpressions.

Example:

(defmacro when
  "Evaluates test. If logical true, evaluates body in an implicit do."
  {:added "1.0"}
  [test & body]
  (list 'if test (cons 'do body)))

clojure.core/when is a syntactic sugar macro which transforms into an if with a do for a then and no else. The body should not be evaled before the test is checked.

Getting access to the unevaluated arguments is available by quoting (' or (quote ...)), but that is often unacceptable for syntactic constructs. Macros are the only way to do that.

3. You need to emit inline code

Sometimes calling a function is unacceptable. That call is either too expensive or is otherwise not the behavior you want.

For instance, in Javascript in the browser, you can call console.log('msg') to print out a message and the line number to the console. In ClojureScript, this becomes something like this: (.log js/console "msg"). Not convenient at all. My first thought was to create a function.1

(defn log [msg]
  (.log js/console msg))

This worked alright for printing the message, but the line numbers were all pointing to the same line: the body of the function! console.log records the line exactly where it is called, so it needs to be inline. I replaced it with a macro, which highlights its purpose as syntactic sugar.

Example:

(defmacro log [msg]
  `(.log js/console ~msg))

The body replaces the call to log, so it is located where it is needed for the proper behavior.

If you need inline code, a macro is the only way.

Other considerations

Of course, any combination of these is also acceptable. And don't forget that although you might need a macro, macros are only available at compile time. So you should consider providing a function that does the same thing and then wrap it with a macro.

Conclusion

Macros are very powerful. Their power comes with a price: they are only available at compile time. Because of that, functions should be preferred to macros. The use of macros should be reserved for those special occasions when their power is needed.

Learn Functional Programming using Clojure with screencasts, visual aids, and interactive exercises
Learn more

You might also like


  1. As it should be :)

LESS has Better Forms of Abstraction than CSS

Summary: LESS has obviously better forms of abstraction and combination than CSS. It has recursive style definitions, which is enough to consider it a "powerful language".

Ok, it's obvious that CSS has weak forms of combination and abstraction. But now we have a good framework for understanding why. "CSS Preprocessors", as they are called, are getting really popular now. We would be smart to analyze LESS in the same way that we analyzed CSS, if only to temper the glamor of trendiness that surrounds it. Comments are welcome.

Because LESS aims to be a superset of CSS, it has all of the primitive expressions, means of combination, and means of abstraction that come baked into CSS. I already went over those last time, so I will not go over them again. So what things are added by LESS?

Primitive Expressions

Besides existing CSS properties, LESS adds two new primitive expressions. Mixin application (.rounded-corners(10px);) in a rule recursively applies the primitive expressions defined in the body of the mixin to the current rule. Mixin applications can be parameterized with value expressions, or they can have no parameters. Extension (&:extend(.blue-button);) is similar, but instead of applying the primitive expressions to a rule body, it adds the selector of the rule to the rule selector of the extension. Extension is recursive as well.

Variables and mathematical expressions change the way primitive properties work. In CSS, primitive properties were comprised of a property name and a literal property value. In LESS, variables and math expressions, as well as literal values, can be in the value place (right hand side) of a property.

Variables can also be used in selectors.

Means of Combination

The principle means of combination are still the rule, but add to it the ability to nest rules, and things are more interesting. Nesting two rules is shorthand for writing out two rules (unnested) with a nested selector. While in the simple case it is simply a shorthand, when nested rules are applied as mixins, you gain a lot more than better syntax. Mixins with nested subrules allows you to name a nesting and refer to it later.

Means of Abstraction

CSS did not contain much in the way of abstraction. LESS focuses primarily in the realm of abstraction, probably to appease the will to power of front-end designers. Variables allow property values to be named, and naming is a form of abstraction. Variables are a good way to name values that all have the same meaning and would therefore change at the same time. For instance, a shade of green that is used throughout the styles is a perfect use for variables. Variables can be used in a similar way to name selectors.

A more powerful form of abstraction comes from the ability to define mixins, apply mixins, and use of :extend(). In LESS, any rule using a single class or id selector can be used as a mixin. This is essentially a way to name a rule--our principle form of combination. In addition, if you put empty parentheses after the class selector in the rule, the rule is not outputted into the generated CSS, which can save bytes. Mixins can also have parameters (scoped variables), so they can be abstracted over a variety of values. Extend allows a similar kind of abstraction which promises to be more efficient.

Mixins are very powerful. In fact, this is the kind of abstraction that is needed for LESS to be powerful, as defined by Abelson and Sussman. Because you can now name a group of styles (mixin) and then use that name in another group of styles (mixin application), LESS has full-on recursive style definitions. With extension, it also has recursive selector definitions. In LESS, we can talk of "levels of abstraction" whereas in CSS there was only one.

Conclusion

LESS has recursion. It lets you define and name groups of properties, then refer to those groups by name in other groups of properties. We can consider LESS powerful enough to express useful abstractions. Yet though it is more powerful than CSS, it still has many of the problems of CSS (especially complex rules governing the combination of multiple rules to a single element). How can LESS be leveraged to gain its power but tame its weakness? Is there a subset of LESS that can gerrymander the good parts away from the bad parts?

For more inspiration, history, interviews, and trends of interest to Clojure programmers, get the free Clojure Gazette.

Learn More

Clojure pulls in ideas from many different languages and paradigms, and also from the broader world, including music and philosophy. The Clojure Gazette shares that vision and weaves a rich tapestry of ideas from the daily flow of library releases to the deep historical roots of computer science.

You might also like

CSS has Weak Forms of Abstraction and Combination

Summary: According to the requirements proposed by Abelson and Sussman, CSS does not provide adequate means of combination and abstraction to be considered a powerful language.

I am trying to improve the maintainability and reusability of my CSS over the longterm. I've written about how to organize CSS before. I've learned a lot since I wrote that. I've tried lots of things and talked to lots of people, I finally seem to have found a conceptual framework to capture my new understanding. I'm trying to explore it here. Comments are welcome.

I'm going to take a cue from the first page of SICP and analyze CSS as a language.

Abelson and Sussman in SICP 1.1 (italics mine):

A powerful programming language is more than just a means for instructing a computer to perform tasks. The language also serves as a framework within which we organize our ideas about processes. Thus, when we describe a language, we should pay particular attention to the means that the language provides for combining simple ideas to form more complex ideas. Every powerful language has three mechanisms for accomplishing this:

primitive expressions, which represent the simplest entities the language is concerned with,

means of combination, by which compound elements are built from simpler ones, and

means of abstraction, by which compound elements can be named and manipulated as units.

Let's analyze CSS in terms of these three mechanisms.

Primitive Expressions

The simplest entities the language is concerned with are properties and primitive selectors. CSS properties, though they have a property name and property value part, are meaningless if split up. Primitive selectors include element name selectors (body, a, div), class name selectors (.main-wrapper), id selectors (#login-form), and pseudo-class selectors (:hover), among others. Properties appear inside the rule body ({}), while selectors appear before the rule body. The two are semantically and syntactically separated.

Means of Combination

Properties can be combined in two ways. First, multiple properties can be put inside the same rule body. This is the most obvious and most readable form of property combination. The second form is harder to reason about. It occurs automatically within the browser during rendering. That form of combination, involving the application of multiple rule bodies to the same HTML element, uses a complex ordering of properties from all bits of CSS and element styles on the page.

Tomes have been written about how difficult it is to reason about this automatic form of combination. Usually, the answer is limiting it (or avoiding it altogether) through programmer discipline, with varying degrees of success.

Primitive selectors can be combined in several ways. Without spaces between them, multiple selectors will intersect, meaning they target elements more specifically. div.main-container will target div elements that ALSO have the class main-container.

With spaces, multiple selectors indicate nesting. div .main-container matches any element of class main-container within any div. There are several operators which combine them in different ways (> indicates direct nesting, etc.). Nested selectors are associated with CSS that is strongly coupled with the structure of the HTML it is styling and therefore less reusable.

Selectors that are combined with commas create a group. These compound selectors will match any element that matches at least one of the component selectors. header, .header will match all header elements and all elements with class header.

There are more types of selector combintation operators, but they are more specialized and less frequently used.

The locus of combination, for both properties and selectors, is the rule. The rule has one compound selector and zero or more properties. Rules with zero properties have no effect.

Means of Abstraction

The means of abstraction in CSS are quite limited. There is no way to name anything. People lament the lack of named values (often refered to as variables) or named styles (sometimes called mixins). Naming is out in CSS.

The only means of abstraction is the class and id, which are labels that can be applied to HTML elements. With an id or class (or combinations), you can target precisely the elements you need to and achieve some reuse. For instance, I can "reuse" the #login-form id selector in two different rules. I can also add the class rounded-corner to two different HTML elements, effectively "reusing" the same rule twice. By a very disciplined use of class selectors by combining them with commas, one can apply "rule bodies" as a unit in a very limited way, though it is impracticable in practice.

The disadvantage to this technique of using id and class selectors is that the HTML must be modified when styles change, defeating the purpose of using CSS for content/style separation. There is a lot of discussion about using semantically named classes. For instance, call the button login-button instead of green-shiny-button. This is thought to be more robust in the face of style changes, but requires existing CSS to be thrown away in order for the page to be redesigned. CSS offers no good way to modify HTML and CSS independently.

Conclusion

CSS does not meet the criteria for a "powerful language" as used in SICP. This is no surprise. The reasonable means of combination are limited to the rule. The means of abstraction are almost non-existent. There is no way to name anything. And the other form of abstraction (ids and classes) provides no way of reusing both the HTML and the CSS. It is obvious why CSS is typically unmaintainable. With the current crop of compile-to-CSS languages (commonly known as "CSS Preprocessors"), there is hope that better means of abstraction are possible. How will compile-to-CSS languages fare in this same analysis?

For more inspiration, history, interviews, and trends of interest to Clojure programmers, get the free Clojure Gazette.

Learn More

Clojure pulls in ideas from many different languages and paradigms, and also from the broader world, including music and philosophy. The Clojure Gazette shares that vision and weaves a rich tapestry of ideas from the daily flow of library releases to the deep historical roots of computer science.

You might also like