Display bugs suck

I was just reading Joel’s blog on the Excel bug that displays the wrong value (100000) for certain floating points numbers very close (but not equal) to 65535. Display bugs suck. Apparently there are exactly 12 possible bad values (out of the very large number of possible floating point values).

I’ve tracked down quite a few over the years, almost all related to localized timestamps in Java, which were pretty much the bane of my existence. The rest probably were due to floating point issues, specifically related to display of Java floats and doubles, which of course are all IEEE 754, just like the Excel bug.

Display bugs are particularly insidious because the values are arithmetically correct and will pass every unit test you throw at them. It’s only when you actually convert those objects to a string for display to a user (often something that involves interaction with the user’s locale) do you get a string that is incorrect.

In Java timestamps, formatting a timestamp to a string will render the time in the current user’s timezone. That sounds fantastically useful unless say, the timestamp value was pulled from a database (that had no timezone information) and used to create a timestamp value on a server process that is in a different timezone than the client. In that case, you are going to take a timestamp value that has an implied timezone (often GMT), construct it in the timezone of the server (say EST), then display it in some other timezone (say PST). This is guaranteed to be wrong, and not just a little wrong but almost unexplainably wrong to a user.

As it turns out, this happens oh, all the time in enterprise environments where your database, server, and client may be on different continents. Because these bugs are dependent on the locale settings on all of these machines (well, not usually on the db machine), they are a pain to test for.

Another related class of bugs are error message display bugs. Hopefully, your error conditions are triggered very infrequently. Of course, this means your error handling code is typically rarely tested. And the cases where it is tested are most likely to be in the field by a user that does not know what they are seeing or how to report it. I’ve found static analysis tools like FindBugs to be invaluable in tracking down provably bad code in error handling routines. Here’s a classic gem: [source:java]

public void someMethod(Foo foo) {

if(foo == null) {

throw new BadFooException(“Got bad foo: ” + foo.getName());

}

…

}

[/source]

Clearly an NPE waiting to happen, but this kind of thing happens all the time, sometimes in much more obscure ways than this.

Also prevalent are error messages that just display the wrong thing. Variables may be switched or just plain wrong. Types can be added to a string and have no useful toString() defined. Etc. Even more subtle are errors like the Excel bug where the user’s locale makes a difference. Error messages are often internationalized which means you can’t check for particular text to occur. One technique I’ve found for testing this is to inject a fault, then check for the presence of the bad value (but not the exact text) in the error message. Cheesy, but better than nothing.

A lot of these error bugs can be tested, but most people don’t think of it. And honestly, maybe the effort/reward isn’t worth it. If the error never occurs or occurs once in a million executions, maybe it’s not worth the 10 or 15 minutes to write the test. I have done a lot of this in maintenance work where an error in a message has been reported, but almost none proactively.

How do you test for bugs like this? Do you even bother to test for bugs like this?

Pure Danger Tech

Display bugs suck