A Year of Clojure

A sprawling ramble through a year of Clojure development….good luck getting through this one!

Background

One year ago I was headed towards a new job a Revelytix to build a brand new semantic web data integration platform. While I was a newbie at the semantic web, I had built a similar platform in the past using Java. I knew that the last time I built something like this, I ultimately found Java to be insufficient for what I wanted to do. Specifically, I found that performing symbolic query optimization was really painful as I could not create the level of abstraction to be able to succinctly represent query optimizations and rewrites. I also had some very frustrating times in the very different area of query processing around concurrency and race conditions.

I had the green (well yellow at least) light to explore other JVM-based languages (the existing products are written in Java) and my goal was to find something that could provide a higher level of abstraction than Java, equivalent performance, and excellent concurrency. At the time, I wasn’t sure I could get the performance I needed out of Groovy (I’d probably be more willing to push past that now) but I was seriously considering Scala and Clojure with Scala as the lead choice.

I was working hard in my free time over the holidays to learn Scala, explore the tools, sample the community, and generally kick the tires. As it turned out, I had a lot of tooling and library issues due to the 2.7/2.8 transition period and that didn’t help, although I tried hard not to hold that against it.

During the process of interviewing new people for the team, David McNeil (whom we hired) encouraged me to take another look at Clojure based on some of his recent playing. I dove into building some real stuff in Clojure as a test. In the meantime, we also hired Ryan Senior (a fan of OCaml and other FP langs) and Nate Young (who has a fair amount of Haskell and Common Lisp background). We pretty quickly found a common affinity for Clojure and dove head-long into it.

Conciseness and Abstraction

A year later, I think I can safely say that we all still really enjoy doing Clojure development. I know I’m having more fun than I’ve had programming in a long time. The only other language I’ve enjoyed as much in recent memory is using Groovy for some scripting and other personal sloppy work. Clojure is so concise and malleable that it gets really close to that feeling of just picking up the code base in your hands and smooshing it into the shape you want.

I find Java to be really painful to read and write now. In general, I find that I cover about a package worth of Java with a file worth of Clojure. Maybe I just tended towards granular Java packages, but that’s a pretty substantial reduction in size. After a year of dev (and a couple more devs), we have four Clojure projects totaling 14 kloc [by LOC here I mean “any line in a source file incl whitespace and comments”] of src and 16 kloc of tests, so about 30 kloc of Clojure total. The average source file is 172 lines and the longest in the whole code base is 725 lines. I don’t know what the average Java file size is but I’m sure it’s way more. This code base contains a SPARQL algebra representation and printing facility, an in-memory RDF graph library, a database metadata importer, two custom ontology generators, a SQL parser/printer, SPARQL to SQL query optimizer, database query, SPARQL endpoint, command-line tooling, federation engine shell, SPARQL endpoint client, and a nascent RIF rules parser and engine implementation. Fitting all that in 14 kloc seems pretty impressive to me. By comparison the Java code in the ARQ SPARQL AST data structure is 14 kloc.

It may seem like I’m hyper-focusing on a really stupid aspect here of just lines in a file, but it’s important. We are limited by what we can see on a page and fit in our memory at a time and a language that lets me chunk information at much higher densities lets me not just see and understand more at a time but means when code needs to change, it’s much faster to do so. If all of the code for some unit of functionality is in a 200 line file, you are much more confident in saying “worst case – I can delete that file and rewrite it better”. In general I’d ballpark that ~50% of our functions are <5 lines and 99% are <12 lines. I continue to be amazed at the ability of Clojure to allow you to create abstractions. The more effort you are willing to apply, the better you can compact your code. That’s comforting because you know you can expend your available effort on the first pass at something but know that you can come back later and chop it in half and come back after that and do it again. I wrote earlier this year that when I perform this same kind of “abstraction pass” in Java I typically end up with more code, not less. The other important thing is that the “more stuff” I get in Java is not typically that much more reusable whereas it often is in Clojure.

Going back to my original goals, we made some huge and exciting head-way on pulling out the structure of a query optimization at a very high level in December, something that was very challenging in the previous system. We still have farther to go and I’ll be talking about it more.

Other observations

Some other miscellaneous observations:

Learning the core API, clojure-contrib, and all of the other libraries is hard. Many many common and not so common things are already implemented but it can sometimes be challenging to know or find it. Sites like clojuredocs and the Clojure cheat sheet help a lot. Utilities like apropos and find-doc also help at the REPL.
What mutable state? I wrote Clojure code for several months before I used my first ref. It’s still surprising to me how much you can do without it.
What macros? I’m a bit embarrassed to say I’ve not yet written a single macro. Partially I tend to think the function version should exist regardless in most cases and partly I work with three other Clojure gurus that can satisfy my macro whims faster than I can state the need. Let’s call it a goal for 2011.
IDEs still need work As good as Emacs/slime/etc is and as much progress as was made this year in the NetBeans, Eclipse, IntelliJ, Vim, etc worlds, there is still tremendous room for improvement. I love love love my paredit but I miss miss miss automatic help with my ns/require/use/import ala Organize Imports, automated refactorings, and many other little things I’ve grown accustomed to over years of Java IDE use.
REPL good ‘Nuff said.
Classic gotchas If you’re just learning Clojure be aware of contains? and its possibly surprising behavior on non-associative collections (see some instead). If your code simply must be running through something but isn’t, then your code is probably lazier than you think it is.
Records and protocols good We started this year using the daily snapshots of 1.2 and stuck with 1.2 stable when it came out. We’ve been using both records and protocols since early on and find them both to be extremely crucial. There are still some rough edges in records and I hope they get more polish for 1.3 with things like constructor functions, print/pprint support, etc.
Build tools We’ve been using both Maven and leiningen in different projects all year long and at the moment we are pulling most (probably all) projects out of Maven and back into lein. lein is obviously very Clojure friendly but we also find that it’s much faster and simpler.

Conferences

In October we had both Strange Loop and the first Clojure-Conj. Strange Loop contained a strong set of Clojure-focused talks on the expression problem, Conduit, Midje, Cascalog, etc as well as a great languages panel where Guy Steele suggested learning Clojure.

The conj was the place to be for Clojure in 2010 – I ran into one Clojure hero after another, not to mention talking to Rich “hammock man” Hickey himself. Can’t wait to see what the Relevance gang cooks up for 2011.

There is so much more I could say but to sum up, Clojure has been very good to us and I’m doing everything I can to show people what it can do for them. In 2011 I’ll be doing a Clojure talk on abstraction focused on Clojure newbies and probably something more advanced around zippers/trees. If you’re looking for a speaker in 2011, drop me a line and I’d love to talk about it. I can also guarantee that Strange Loop 2011 will have a healthy taste of Clojure goodness in the mix.

Pure Danger Tech

A Year of Clojure

Background

Conciseness and Abstraction

Other observations

Conferences