Real world Clojure

There was a request on a HN comment for more info about “real world Clojure” so I turned what started as a comment and got long into a post about our experiences at Revelytix…

We started using Clojure at the beginning of 2010 building a new set of enterprise data integration and analytics products at Revelytix. Initially we had 5 developers and we are currently up to 10.

We have 9 people doing Clojure full-time creating enterprise data integration software at Revelytix. I will be doing a talk about some of our work at Devoxx next month. David McNeil, my colleague, is talking about it at the Clojure/conj conference next month as well. We have about 60,000 lines (just with wc -l, so including comments and blanks) of Clojure, roughly half test code. We started on Clojure 1.2 pre-release and tracked the daily snapshots. Since then we’ve been on 1.2, slowly moving towards 1.3 right now. We used 1.2 early to get access to new (at the time) features like records and protocols.

In no particular order, some observations:

Clojure code is about 1-2 orders of magnitude smaller than the equivalent in Java, depending on what you’re doing. If you focus on it, you can go further by building better abstractions (often ones that would be impossible or highly impractical in Java). This is in some ways a superficial comparison but is actually fairly important. I’m confident that it would be harder for us to understand and maintain our system if it was 600k lines of code rather than 60k loc.
The reason we chose Clojure was that I, in particular, had built a similar system previously at MetaMatrix (later sold and open-sourced as JBoss Teiid) using Java. In the area of query planning and optimization, I found that at some point I hit a wall with what I could do in Java. There was latent abstraction that I understood but could not express in the code. I was determined to walk down that path this second time in a language that had more powerful abstraction capabilities to let me get closer to talking in terms of my problem. I don’t think we’ve really gone after that as an objective very hard yet but even so, we’ve had clear success when we have.
At the time we started, the initial 5 developers were all new to Clojure, although all of us had some experience with Scheme, Lisp, OCaml, or other functional languages. I don’t think any of us found it difficult to “learn Clojure”. I feel like we’re all still learning every day how best to leverage it. As we’ve on-boarded new people, I feel like it probably took longer than it would have with Java, but not orders of magnitude different.
I find it challenging to find ways to express designs for Clojure (and maybe more generally FP). I think UML in all its glory is pretty much a waste of time, but I think the core class / sequence / component diagrams, when done as a description rather than a spec, are incredibly useful as a way to communicate design at a higher level than the code. Some parts of Clojure code are easy to represent as record structures or function signatures. Data flow diagrams work well with other parts. However, once a design starts to leverage HOF (as good abstractions inevitably do), it is difficult to both describe the high-level intent while staying accurate to the low-level code. I’m open to ideas. :)
I don’t find that using Clojure making the overall process of writing new code faster. My thinking/typing ratio is much higher though. I think the reason is that I have a huge confidence in Java refactoring tools and my abilities to morph the code towards where it should go. In Clojure, if you start writing code there is no ceremony and within two minutes you realize that your first idea was dumb and your data structure should be totally different and it drives you back into the thinking phase.
Clojure tooling is acceptable. I did serious stints using TextMate+REPL, Netbeans+Enclojure, Eclipse+CCW, and finally gave up and learned Emacs so my teammates would stop making fun of me. Thanks to sitting a few feet from guru Emacs users, I was proficient within a few days. I don’t think I could live without paredit and I am pretty happy with Emacs when I am in the mode of single source file + single test file + REPL. When I am doing larger more architectural work, I find using something with a graphical browser like Eclipse+CCW much easier. Debugging tools are hard to set up (not that I don’t greatly appreciate the work people have put into them so far) and light years from Java tooling. Profiling works just fine with JVM profilers but the results can be very difficult to interpret. Build stuff with leiningen is very good if you’re on the 80% path.
More than anything with Clojure I’ve come to appreciate the data-centric approach to building software. This is deserving of a much longer and more exploratory post someday but I can now see how Java locks your data away in boxes and makes you use keys (and write key factories and key adapters and key factory bridge adapters) to get it back out again. Like I said, need to write this up in more detail.
Performance has been generally good so far although we do have “slow” parts of the system that we need to investigate more. Laziness can be a pain in understanding where time is going and understanding stack trace sampling. We’ll be doing more work on performance in the coming year I’m sure.

If you have questions that I didn’t think to answer, please ask…

Pure Danger Tech

Real world Clojure