Patterns I Love/Hate #3: Visitor

Apparently my previous pattern posts on Singleton and Template Method hit a nerve based on comments and traffic – thanks for reading! I’m now moving the series from patterns that I hate to patterns that I simultaneous love and hate, because hey, I’m a complicated man (and no one understands me but my woman).

The Visitor pattern is one that I’ve spent lots and lots of time using and it has unquestionably been a key part of many systems I’ve designed and used. At the same time, there are aspects of the Visitor pattern that drive me nuts. I’ll try to explore both sides in this post. Note: I’m primarily a Java guy and this discussion of visitors is definitely Java-centric in a number of respects.

Dropping by for a visit

The Visitor pattern is typically used when you have a multi-object data structure (often something tree-like) and you want to traverse the data structure and implement various algorithms over it. It will help to have an example, so I’ll spare you the canonical Car and Shape examples. In honor of the upcoming Simpsons movie, let’s imagine a Simpsons game that has characters, props, etc. Here’s a bunch of data classes we might want to work with:

Visitor tends to be used in Composition hierarchies and is especially useful when these hierarchies are heterogeneous. When Composition hierarchies are homogeneous (a tree full of the same node class) the visitor pattern is still useful but you can get many of the same benefits without it.

In our game you see a lot of different kinds of Entity objects, including some that use composition to aggregate objects at various levels in the hierarchy and a Game object that is not even an Entity but holds all the Places. It is a key benefit of the Visitor pattern that all of the concrete data classes you want to visit do not have to share a common class hierarchy.

Now, let’s say we want to find all characters in the game that have dialog to process. We’ll need to walk through all the Places, then through all the Characters in each Place and collect all the dialog. One way is to just add the methods in all the classes involved: [source:java]

public class Game {

// … code managing places

public void collectDialog(Map<Character,String> dialog) {

for(Place place : places) {

place.collectDialog(dialog);

}

}</p>

public abstract class Place {

// … code managing Characters and Props

public void collectDialog(Map<Character,String> dialog) {

for(Character character : characters) {

dialog.put(character, character.getDialog());

}

public abstract class Character extends Entity {

public abstract String getDialog();

}

public class Homer extends Character {

public String getDialog() {

return “Donuts. Is there anything they can’t do?”;

}

// etc for each character

// To execute:

Map<Character,String> dialog = new HashMap<Character,String>();

game.collectDialog(dialog);

[/source]

So, that works fine. But you might find that these sorts of methods tend to multiply. I might need to update the movement of all Characters and movable Props in the game in a time step. Or enter a Krazy Krusty mode where every Character turns into Krusty and some Props turn into squirting flowers. Or you might just need to find all of a particular kind of a Prop across the game.

The problem is that each time you find a new algorithm you want to implement over this data structure, you have to change some or all of the concrete types in the data structure.

Another way to look at the commonalities between these algorithms is that they (1) navigate a data structure and (2) perform some action at (potentially) all instances in the data structure.

Visitor is a pattern that pulls these two items out of the data classes and replaces it with visitor-agnostic code in each data class. The algorithm-specific code is put in a class (the visitor) dedicated to the algorithm instead. Let’s look at how the visitor pattern could be used in the example above. I’ll show just the visitor-related code here. First, the concrete data classes each need to have an acceptVisitor method: [source:java]

public class Game {

public void acceptVisitor(GameVisitor visitor) {

visitor.visit(this);

for(Place place : places) {

place.acceptVisitor(visitor);

}

public abstract class Entity {

public abstract void acceptVisitor(GameVisitor visitor);

}

public class Refrigerator extends Prop {

public void acceptVisitor(GameVisitor visitor) {

visitor.visit(this);

for(Prop prop : getFood()) {

prop.acceptVisitor(visitor);

}

public class Homer extends Character {

public void acceptVisitor(GameVisitor visitor) {

visitor.visit(this);

}

// other Character and Prop subclasses are similar to Homer

public abstract class Place extends Entity {

public void acceptVisitor(GameVisitor visitor) {

for(Character character : getCharacters()) {

character.acceptVisitor(visitor);

}

for(Prop prop : getProps()) {

prop.acceptVisitor(visitor);

}

public class Moes extends Place {

public void acceptVisitor(GameVisitor visitor) {

visitor.visit(this);

super.acceptVisitor(visitor);

}

// other Place subclasses look the same as Moes

[/source]

Each of these acceptVisitor methods has two responsibilities. First, it must call back to the visitor. This is the key to the whole pattern as this callback leverages the static type of the this object to correctly call the right method on the visitor without needing some sort of horrible if/else or switch block. (You may know this callback by the term “double dispatch”.) Second, the acceptVisitor method must traverse “down” the data structure to any contained elements (we’ll come back to this later).

Then we need to look at the actual visitor code itself. [source:java]

public interface GameVisitor {

public void visit(Game game);

public void visit(Apu apu);

public void visit(Bart bart);

public void visit(Homer homer);

public void visit(Moes moes);

public void visit(NuclearPlant plant);

public void visit(SimpsonsHouse house);

public void visit(Donut donut);

public void visit(DuffBeer beer);

public void visit(Refrigerator fridge);

}

public class CollectDialog implements GameVisitor {

private Map<Character,String> dialog = new HashMap<Character,String>();

public Map<Character,String> getDialog() {

return dialog;

}

public void visit(Game game) { }

public void visit(Apu apu) {

dialog.put(apu, apu.getDialog());

}

public void visit(Bart bart) {

dialog.put(bart, bart.getDialog());

}

public void visit(Homer homer) {

dialog.put(homer, homer.getDialog());

}

public void visit(Moes moes) { }

public void visit(NuclearPlant plant) { }

public void visit(SimpsonsHouse house) { }

public void visit(Donut donut) { }

public void visit(DuffBeer beer) { }

public void visit(Refrigerator fridge) { }

}

// To execute:

CollectDialog collectDialog = new CollectDialog();

game.acceptVisitor(collectDialog);

Map<Character,String> dialog = collectDialog.getDialog();

[/source]

Here the GameVisitor defines the basic Visitor interface that each algorithm must implement. The interface consists simply of a visit method for each concrete type. In the CollectDialog visitor, we simply ask the characters for their dialog on those visit methods and do nothing on all the other visit methods.

So, there is clearly more code here than in the original example. The difference is that we are now set up to add new algorithms at will without modifying any of the data classes. So, adding Krazy Krusty mode is as simple as adding one visitor class, instead of modifying many or all of the data classes. Another nice benefit is that you don’t have to start from the Game class; you can start from any class in the data structure and traverse down through the structure.

Pretty cool stuff. Visitor lets you “bake in” the ability to support any number of algorithms over a heterogeneous data structure without modifying the data structure as algorithms are added.

So what’s to hate?

As you might expect, some of the very things that make Visitor useful also make it frustrating. Here’s a list of issues we’ll explore:

Expanding data structure breaks visitors
Navigation strategies
Return values
Exception handling
Execution and use of visitors
Performance

Expanding the data structure

The first aspect of the Visitor pattern to note is that the GameVisitor interface contains a visit method for every concrete class in the data structure. So, with the flexibility we’ve gained in adding new algorithms, we’ve given up the flexibility to expand the data structure without breaking our existing visitors. In the current implementation, The addition of a new character such as Lisa would break the visitor framework because the Lisa’s acceptVisitor method needs to call back to a visit(Lisa) method on GameVisitor that does not exist.

If you completely control all visitors all your data structure, this may not be an issue. It is simple to add a visit method to GameVisitor and update all existing visitors appropriately.

However, if you want a bit more insulation and protection from this kind of change, I would recommend introducing an abstract base class for your visitors that implements all visit methods with a no-op implementation. Once you have this structure in place, the addition of a new class requires:

Adding a new data structure class
Adding a new visit method to the GameVisitor interface
Adding a new no-op visit method to the BaseVisitor interface

No other visitors are affected from a compilation point of view. They still may need to be reviewed for whether they should override the default dummy visit method in BaseVisitor.

Another nice side effect of this change is that we can remove the empty methods in our CollectDialog (or any other visitor) as they are already implemented as a no op in the base class.

One gross part of the visitor pattern (as implemented above) is that each class with “children” must traverse its children in the visit method. This means that navigation logic is distributed throughout the data structure and is likely to get overlooked as the class evolves. Perhaps more importantly, it means that there is exactly one fixed navigation strategy hard-coded into the structure of the class. For simple data structures with obvious traversal strategies, I think this is fine. But it’s certainly not the only choice.

One common option for moving this logic somewhere is to build an Iterator or Strategy that knows how to traverse the composition hierarchy. Each data class then calls the Strategy then knows which object is the next to traverse.

You can actually even use double dispatch again to implement an Iterator as a co-operative class that works with Visitor. This allows you to separate the navigation class hierarchy from the visitor class hierarchy. Two big downsides to this approach is that the call sequence is much more complex and that the visit() and iterate() methods of every class visited so far are left on call stack. This once culminated for me in blowing the thread stack limit on some particularly large (but shallow) data structures. I don’t really recommend this option.

Another interesting way to think about the navigation logic is that it is itself like a visitor – it is a bit of data class specific logic that must be executed to determine what children each data class contains. Using this intuition, you can implement your navigation logic itself as a visitor that does nothing but say how to navigate when asked to visit. You can then give the navigation visitor a more traditional “logic” visitor that rides along on top of the navigator.

Let’s look at an example of this: [source:java]

// Will be needed inside the navigation visitor to have a common generic type

public interface Visitable {

void acceptVisitor(GameVisitor visitor);

}

public class NavigationVisitor extends BaseVisitor {

private GameVisitor logicVisitor;

private LinkedList<? super Visitable> itemQueue =

new LinkedList();

public NavigationVisitor(GameVisitor logicVisitor) {

this.logicVisitor = logicVisitor;

}

private void visitNext() {

if(! itemQueue.isEmpty()) {

Visitable first = (Visitable) itemQueue.removeFirst();

first.acceptVisitor(this);

}

public void visit(Apu apu) {

logicVisitor.visit(apu);

visitNext();

}

public void visit(SimpsonsHouse house) {

logicVisitor.visit(house);

itemQueue.addAll(house.getCharacters());

itemQueue.addAll(house.getProps());

visitNext();

}

[/source]

Here we see a NavigationVisitor that holds a queue of items to be processed by the visitor. Each visit method visits the logic visitor that is riding this navigation visitor. It then adds any children for the current data item to the queue (if any). And finally, all visit methods must call the visitNext() method to pick off the next item in the queue and visit it next. This implements a breadth-first traversal of the data structure. It’s also trivial to implement depth-first and you can also change whether you visit the nodes in pre-, in-, or post-order.

You might also have some custom traversals that always need to walk Characters first or Props first or whatever. Doing that is as easy as adding a navigation visitor.

One classic bug that occurs frequently when using a navigation visitor is the case of calling acceptVisitor with the logic visitor without using a navigation visitor at all. Since the navigation logic no longer exists in the concrete classes, if you don’t use a navigation visitor then you will not traverse any objects except the first one.

Return values

Return values are an area where visitors fall down a bit. The traditional way to return or collect values while visiting is to store the state in the visitor, then provide getters to retrieve the state from the visitor at the end. This works and is the usual way to do this but also feels repetitive and clunky after you write a few visitors.

Another cleaner way to do this was suggested recently by Ricky Clarkson by using generic visit methods that bind the return type of the visit method to the type of a visitor. I have not tried this in my code yet but it seems like a very interesting idea. The one place where it doesn’t look so great is when the visit method does not need to return a value. In that case, the visit method must right now return null. Ricky suggests that allowing us to specify would be a nice way to solve this problem in Java 7.

Exception handling

Exception handling always sucks with visitors. You can’t put a visitor-specific exception on the visit() methods as this would also affect the base interface and all acceptVisitor() methods. So you are left with several options:

Store as state in the visitor class
Declare throws Exception (or other more constrained type) on all methods
Throw exceptions from your visitor, but use only RuntimeExceptions

I’ve tried all of these at one point or another and can’t say I like any of them. The first option is the weirdest in some ways since it subverts the expected use of exceptions, but probably the one I’ve liked the best. It allows you to store exactly the right kind of exception in your visitor and still not need to declare it on the visit() methods.

The second option forces you into catching Exception on callers of all the visitors. There are ways to trap this and rethrow the real exception (see next item) but this option just feels wrong.

The third option is workable but it probably depends upon how friendly you are with the idea of doing away with checked exceptions and moving instead to using mostly unchecked exceptions. If you’ve already done that and are mostly using unchecked exceptions in your code, then this will be easy for you. If not, this will most likely seem weird and scary as you will start catching specific runtime exceptions when you call visitors (none of which are declared in the signatures).

Execution and use of visitors

Another item of visitor weirdness is the way in which they are invoked, particularly when doing navigation via a visitor: [source:java]

CollectDialog collectDialog = new CollectDialog();

NavigationVisitor nav = new NavigationVisitor(collectDialog);

game.acceptVisitor(nav);

Map<Character,String> dialog = collectDialog.getDialog();

[/source]

These four lines will be repeated in every visitor usage scenario. Generally it useful to build a couple helper methods to make this cleaner. First, it’s useful to wrap up the middle two lines in a static method in NavigationVisitor: [source:java]

public static

T execVisitor(T visitor, Visitable visitable) {

NavigationVisitor nav = new NavigationVisitor(visitor);

visitable.acceptVisitor(nav);

return visitor;

}

[/source]

You can then execute a visitor with navigation with just: [source:java]

CollectDialog collectDialog =

NavigationVisitor.executeVisitor(new CollectDialog(), game);

Map<Character,String> dialog = collectDialog.getDialog();

[/source]

And it can be helpful to push this code into a static method in CollectDialog itself: [source:java]

public static Map<Character,String> collect(Game game) {

CollectDialog collectDialog =

NavigationVisitor.executeVisitor(new CollectDialog(), game);

return collectDialog.getDialog();

}

[/source]

One other nice thing you can do if you are storing exceptions within your visitor is to check for and actually throw them from the static helper method. This makes the exception handling more normal for callers of the static method.

Another item I want to mention is that one side effect of using the Visitor pattern is that you will have lots of public visit() methods in each visitor class and these methods should NOT be called by anyone outside the visitor. The methods typically must be public to be called by data classes in a different package. There’s not really any workaround for this issue – it is something that should be documented in the javadoc though (which of course, no one will read).

Performance

I’ve already mentioned some memory considerations when using an Iterator in conjunction with Visitor. You may have noticed in the visitor examples above that we are now traversing the entire data structure looking for dialog when the original implementation only looked at Characters in Places. In my experience, this is a common occurrence with visitors. It’s so easy to implement a new visitor that we don’t worry about how much extra and unnecessary work is occurring. This is also something that’s a little difficult to spot with a profiler because the extra overhead is spread over a bunch of methods that traverse the data structure unnecessarily so no one method shows up as a hot spot.

There are a handful of fairly easy things you can do to address visitor performance as you are building out your solution:

Add a reset() method to allow reuse of visitor instances. The visitor instance can typically be reused, thus saving the cost of constructing the visitor instance. They’re almost certainly not worth pooling (like most Java objects) but when used multiple times in the same thread, there is no reason not to reuse them. I’ve found that having a standard reset() method on the visitor interface forces implementors to implement this behavior and makes it a common part of the visitor usage in your system. If you are wrapping your visitors with static helper methods, those methods may be able to store and reuse visitor instances, perhaps in a ThreadLocal.
Add a mechanism to abort/terminate a visitor. It is useful to have some way to abort the visitor execution for visitors where you are finding something and do not need to visit the entire data structure. Depending on where navigation logic lives, there are different ways to do this. In some cases you’ll want some way to store an abort flag on the visitor itself and in some cases you’ll want to build this into your navigation logic. You will definitely want an abort mechanism if you are storing exceptions in the visitor state, as you don’t want to continue traversing the data structure once an error has occurred.
Create custom navigators that only visit part of the data structure. In the case of our CollectDialog visitor above, we know that we only want to visit a portion of the data structure, so we could write a navigation visitor that only visits Places and Characters and skips Props entirely. In general, I recommend doing this sparingly for navigation cases that are frequently used or for cases that show up as performance hot spots. The reason is that if you create a custom navigator and visitor for every algorithm, you are writing a lot of extra code and not getting the reuse benefit of the navigation logic. One alternative might be to just have a single, very configurable navigator. I haven’t done that before, but I imagine it would work.
Create a custom navigator with “stop” nodes. One variant of the prior idea that I’ve found useful is the notion of “stop” nodes. For example, in the case of an AST (abstract syntax tree) generated by a parser, you may want to traverse some portion of the syntax tree but “stop” once you hit some kind of node. For instance, you might traverse an AST representing a SQL query SELECT clause and “stop” at nodes representing aggregate functions. Maybe this technique is a bit specialized but I found that it was perfect for some classes of visitor.

What are the alternatives

One obvious alternative is to simply hardcode the navigation and collection logic, either in the concrete objects or completely outside them in helper classes. For simpler data structures, this makes good sense. As we’ve seen above, there are all sorts of consequences of using the Visitor pattern.

Another interesting future possibility in Java is closures. The notion of passing execution logic into some sort of navigator could also be solved quite naturally using closures. Some of the same issues with return values and exception handling still exist and are the cause for a lot of the complexity in the BGGA closure proposal for example.

Examples

I’ve built full examples of some of the earlier code samples and you can grab them in this convenient zip to play with them yourself. Each package contains a complete and independent version of the same code:

v1 – original non-visitor example
v2 – rudimentary visitor example
v3 – use of a visitor base class
v4 – use of navigation visitors

Thanks again to everyone that has read, commented, or emailed on prior posts. I’ve enjoyed the conversations and learned a lot.

Pure Danger Tech

Patterns I Love/Hate #3: Visitor

Dropping by for a visit

So what’s to hate?

Expanding the data structure

Navigation strategies

Return values

Exception handling

Execution and use of visitors

Performance

What are the alternatives

Examples