My colleague Chris Dennis, sent this puzzler around today:
import java.util.HashMap; import java.util.HashSet; import java.util.Map; import java.util.Set; import java.util.concurrent.ConcurrentHashMap; public class EntrySetSize { public static void main(String[] args) { printSetSize(new ConcurrentHashMap()); printSetSize(new HashMap()); } private static void printSetSize(Map map) { map.put("hello", "world"); map.put("world", "hello"); Set set = new HashSet(map.entrySet()); System.out.print(set.size() + " "); } }
The question is what does this program print when executed?
- a) 2 2
- b) 1 2
- c) 0 2
-
d) something else</li> </ul> It seems like the answer should be a) and you will actually see that in JDK 1.6+. However, in JDK 1.5 you’ll see b). Understanding why requires a bit of a jaunt through the CHM code, where dragons lurk. Also, it’s actually possible to see several possible answers (at least a and b) in JDK 1.5 depending on what keys and values you use.
map.entrySet()
This call on a CHM does the following:
public Set<Map.Entry<K,V>> entrySet() { Set<Map.Entry<K,V>> es = entrySet; return (es != null) ? es : (entrySet = (Set<Map.Entry<K,V>>) (Set) new EntrySet()); }
</pre>
All that does is create (and cache) an EntrySet instance, which is an inner (non-static) class specific to CHM. It has access to the real CHM state and implements the interface using that state.
**new HashSet()**
The HashSet is constructed with the EntrySet instance. A new HashSet with an initialization collection will do this:
<pre class="brush: java; title: ; notranslate" title="">public HashSet(Collection<? extends E> c) {
map = new HashMap<E,Object>(Math.max((int) (c.size()/.75f) + 1, 16));
addAll(c);
}
</pre>
So, the state from the source collection is added through addAll().
**HashSet.addAll()**
<pre class="brush: java; title: ; notranslate" title="">public boolean addAll(Collection<? extends E> c) {
boolean modified = false;
Iterator<? extends E> e = c.iterator();
while (e.hasNext()) {
if (add(e.next()))
modified = true;
}
return modified;
}
</pre>
There is some extra stuff here to manage the modified flag needed for fail-fast iteration but basically, this code walks through the collection’s iterator calling next(), then add() on the set under construction. Here’s where things get interesting. We need to look at what the EntrySet uses for an iterator and what happens when you call next().
**EntrySet.iterator()**
<pre class="brush: java; title: ; notranslate" title="">public Iterator<Map.Entry<K,V>> iterator() {
return new EntryIterator();
}
</pre>
Here we return yet another inner non-static class EntryIterator. EntryIterator is defined as follows:
<pre class="brush: java; title: ; notranslate" title="">final class EntryIterator extends HashIterator implements Map.Entry<K,V>, Iterator<Entry<K,V>>
</pre>
This should give you some pause. We fully expect to see “extends HashIterator” – there are several key/value/entry iterator flavors buried in here and they share most of their implementation in HashIterator. We also fully expect to see “implements Iterator<Entry<K,V>>” since that’s what we’re doing. But seeing “implements Map.Entry<K,V>” should make you go huh? This iterator walks over Entry<K,V> instances in the CHM yet it itself is typed as an entry?
What does it do on Iterator.next()?
**EntryIterator.next()**
public Map.Entry<K,V> next() {
nextEntry();
return this;
}
Here we see the iterator advance to the next entry (it walks through segments at the high level and entries at the low level) but in the end it returns **this**?
The key here is that the EntrySet’s iterator just stores the current key/value during iteration and returns itself as the entry object for every single actual entry. If you modify that entry, it writes through appropriately to the CHM. Presumably, this saves on the creation and destruction of lots of little Map.Entry instances that will be used just during iteration and typically never looked at again.
Therefore, when the EntrySet iterator is walked through in HashSet.addAll(), AND because of how EntryIterator.hashCode() is implemented:
<pre class="brush: java; title: ; notranslate" title="">public int hashCode() {
// If not acting as entry, just use default.
if (lastReturned == null)
return super.hashCode();
Object k = getKey();
Object v = getValue();
return ((k == null) ? 0 : k.hashCode()) ^
((v == null) ? 0 : v.hashCode());
}
</pre>
Here the hashCode is created as the bitwise OR of the key hash code and the value hash code. So, regardless of what the key and value hashcodes are, a ^ b == b ^ a. In our case we used two strings “hello” and “world” swapped. Thus the second entry instance will be seen as an identical hash code (and instance) and will simply put over the first entry instance in the HashSet.
A subtly different bug will happen in the case where this property (of swapped key/value) doesn’t occur. In that case, both entries will end up in the HashSet (size == 2 as expected) but they are the _same_ instance and were put into the set with different hash codes. It’s always a bad deal to put something in a HashSet or HashMap and then modify the object in ways that modify the hashCode(). This breaks assumptions in the HashMap class that can cause lost elements in the collection and other weird behavior.
As I mentioned, this same instance behavior is probably surprising to most people. It’s doc’ed here in [Sun bug #6312706](http://bugs.sun.com/view_bug.do?bug_id=6312706); as a general problem in several maps (CHM, IdentityHashMap, and EnumMap). That bug is marked as “fix understood” which is kind of surprising because this behavior is already changed and fixed in JDK 1.6. In 1.6, the EntryIterator is implemented as:
<pre class="brush: java; title: ; notranslate" title="">final class EntryIterator
extends HashIterator
implements Iterator<Entry<K,V>>
{
public Map.Entry<K,V> next() {
HashEntry<K,V> e = super.nextEntry();
return new WriteThroughEntry(e.key, e.value);
}
}
</pre>
Here EntryIterator does not implement Map.Entry and next() returns a new instance of WriteThroughEntry, which causes the HashSet to be much happier on creation.
Interestingly, another smart colleague noted that the existing Terracotta ConcurrentHashMap instrumentation fixes the 1.5 implementation by creating a wrapper object in the iterator as well.