Pure Danger Tech


Map bugs

27 Oct 2009

A colleague ran into some bugs this morning in Cliff Click‘s “High Scale Lib”, specifically in NonblockingHashMap. I mentioned them on Twitter and some people asked me to blog, so here you go.

NBHM is a masterful concurrent Map implementation that allows for a highly concurrent usage, beyond what your more familiar ConcurrentHashMap can handle. Make no mistake, CHM works just fine for a lot of people and it’s an excellent implementation. NBHM achieves greater concurrency under large numbers of threads by using “non-blocking synchronization” (you’ll also often hear this called “lock-free” which has become taken as a more refined subset).

Non-blocking algorithms do not use locks where threads must block but instead typically use some kind of atomic compound operation like read-modify-write, compare-and-swap (CAS), etc. Cliff did a great talk on this concurrency style at JavaOne in 2008 which I wrote a blog about. Cliff works for Azul, which makes boxes with a large number of CPUs (768+) so they regularly need a very large number of threads. NBHM can scale up to 100s or even 1000s of concurrently reading AND writing threads. CHM does not.

This is cool stuff but has a big downside in that there are probably only a couple dozen people on the planet that understand it.

Anyhow, the bugs my colleague found are not in the hard stuff (we did manage to flush out an obscure bug that regularly crashed Hotspot 1.6 on Sparc) but in the contract for methods like remove() and replace(). These methods return a boolean to indicate whether the remove or replace occurred. They are currently checking based on identity, instead of based on equality.

For example, the ConcurrentMap contract for remove() says:

boolean remove(Object key, Object value)

Removes the entry for a key only if currently mapped to a given 
value. This is equivalent to:

       if (map.containsKey(key) && map.get(key).equals(value)) {
           return true;
       } else return false;

except that the action is performed atomically. 

but the NBHM implementation says:

return putIfMatch( key,TOMBSTONE, val ) == val;

which is based on identity with val instead of .equals(). The ConcurrentMap.replace(K key, V oldValue, V newValue) implementation has the same issue. The fix is to do something like this instead:

return val.equals(putIfMatch( key,TOMBSTONE, val ));

A simple test case here is something like:

public void testRemoveIsEqualityBased() {
    ConcurrentMap<String,String> m = new NonBlockingHashMap<String,String>();

    // load entry
    String key = new String("key");
    String value = new String("val");
    assertNull(m.put(key, value));
    // remove entry if value exists
    String equalValue = new String("val");  // not identical but equals
    assertTrue(m.remove(key, equalValue));   // NBHM returns false

This test will fail with the current implementation but pass with the modified version (or CHM or ConcurrentSkipListMap).

Abhishek (the finder), has now filed a bug in the High Scale Lib project to feed these fixes back into the library (we use a local copy with some modifications and enhancements).