I have been thinking a lot recently about concurrency and how to best map some problems I’m working on into a concurrency model. Spurred by some discussions with Kyle Cordes after his talk on fork/join at the St. Louis JUG last night and Cedric’s blog post today, I was inspired to write down some things I’ve said to others lately.
I am currently thinking about problems (I think common problems, but at least my problem) as having these properties:
- work expressed as small compute-intensive computations
- computations have dependencies on other computations
- a set of real kernel threads that presumably map to real cores
- # of computations is much bigger than # of threads
And we want a solution that lets us a) easily express the computations, b) keep the threads as busy doing computations and not waiting for I/O, locks, or context switching.
To me, there are several important issues to think about.
- Because the work to be done is presumably greater than the number of threads, queueing is inherent in the problem. We must collect work to be done somewhere. Queues are great but they have a notable downside that having many threads hitting the same queue introduces contention on the head of the queue. As the number of cores and threads increases, this is an issue. Because your computations are assumed to be fine-grained and compute-intensive, you must frequently go back to the queue for more work.
- Dependencies – embarrassingly parallel problems are great for showing off different models, but they don’t really show up too often in practice (in my experience). Rather, most real problems require some work to be done before other work is done, etc etc. I think that frameworks that use dependency information, either explicitly or implicitly are likely to do better in the long-term. I have no proof of this, it just seems intuitive to me that there is structure there and we should leverage it.
There are many ways to map the properties above onto different concurrency models. Let’s do that while considering the concerns above.
- Executors (java.util.concurrent) – put your computations in Runnables or Callables and submit them to an ExecutorService that is backed by a thread pool. Express dependencies by using Futures or other techniques. Two problems here – first, executors generally assume that there is 1 queue and many threads which introduces the queue contention problem mentioned above. Also, it has no way for the work to be scheduled with knowledge of dependencies. It is possible to build that over the top of course, but it’s a lot of work orthogonal to your problem at hand. The queue+threads model can’t scale as is.
- Fork/join – create your computations as RecursiveTasks or RecursiveActions and submit them to a ForkJoinPool that is backed by a thread pool. Express dependencies directly using the RecursiveTask apis for fork, join, invoke, etc. Fork/join addresses both of the concerns I mention above. Instead of a single work queue, there is one work queue per thread. This means that at the head of the queue there is no contention – there is only 1 thread reading from it. Fork/join also addresses the dependency concern because it knows that one task is waiting for another to complete – the work/stealing algorithm inherently leverages these dependencies. I would urge you to watch Doug Lea’s talk from the JVM Language Summit 2010.
- Actors – express your computations as the run loop of an actor. Communicate between actors with messages – this makes the dependencies asynchronous AND somewhat invisible to whomever is scheduling actor invocations except by the arrival of messages in a mailbox. Every actor has a mailbox which effectively means there is one queue per-actor, which lets you decide in your problem how finely to cut it up. Something has to actually schedule the computations of the actors though – note that Scala actors are backed by … a fork/join pool. It seems to me that the actor model obscures the dependency information from fork/join – there is nothing captured at the underlying level when an actor has sent a message to another and is waiting for a response. That’s implicitly captured by having a message in one mailbox and nothing in the waiting actor, but it seems impossible to convey the higher-level dependency structure to the underlying scheduler.
- Continuations – to me, continuations are (at a high level) pretty similar to actors. They have the benefit that can presumably be paused from outside the computation, so an external scheduler might be able to timeslice work in and out in some better way, but it seems like there is a lot of machinery there that adds overhead.
- Data flow – data flow is a very intriguing model because it lets you explicitly model the data dependencies between tasks. GPars probably has the most interesting implementation of it that I know of. There are a few other variants written for Clojure (that relies on the underlying agent framework) and Scala (that relies on the underlying actor infrastructure). Because those rely on underlying frameworks, I’m pretty sure they don’t optimally leverage the dependency information inherent in data flow tasks. I’d love to see a framework that was optimized to leverage that dependency info though.
- Clojure – Clojure actually has a bunch of different things that work in concert so it’s hard for me to describe it as any one model. Most state is immutable and persistent via structural sharing. When you want to mutate state, there are a variety of features (refs, atoms, agents) that let you choose whether state changes should happen synchronously or asynchronously, and whether they should or should not be coordinated with others. Clojure has STM which allows you to synchronize multiple state changes in a well-ordered way. MVCC lets you see a consistent view during the change (again leveraging the persistent data structures) and transactions are retried in the case of conflict. Reads are always available, again due to the data structures. Clojure agents are backed ultimately by an executor pool (one that is internal and you have no control over). There is work ongoing in Clojure to create a set of functions over sequences (filter, map, etc) that is backed by parallel execution against a fork/join pool and I think that shows great promise to provide easy benefits for a different kind of problem (where you are working with large chunks of data).
The app I’m working on is in Clojure but for now I am focusing on building a prototype system that uses fork/join directly as I do not want to introduce the queue contention inherent in the underlying executor pool. However, I’d really like to express my problem in ways that are more similar to the data flow paradigm. If someone was to marry the two, I think that would be a beautiful thing. Heck, I might even take a stab at it eventually.
If you have any thoughts on this, I’d love to hear them….they are merely a snapshot of my current thinking.