JSR 326 – Post mortem JVM Diagnostics API

There was an interesting new JSR proposed last week to define an API for initiating in-flight diagnostic dumps and analyzing artifacts created by a JVM. There are already a lot of formats for things like heap dumps, profiling, stack traces, native memory dumps, and so on and the point made in the JSR is that this situation is getting worse, not better:

“Existing Java diagnostic tools are focused primarily on what can be termed “live monitoring” – this means source level debuggers, trace tools, performance analysers etc. These tools are very useful when the problem is readily reproducible and the customer is willing to accept the costs of such reproduction. However, in many cases problems do not fall into either of these categories as the problem is either intermittent or the impact of reproduction with live monitoring tools is too expensive. In these cases the pressure is then on those supporting the customer to solve the issue in other ways. Here we enter the realm of post mortem analysis as the primary means for uncovering the cause of the issue. Unfortunately this space is fragmented and incomplete.

The lack of pervasive and credible post mortem and snapshot diagnostic tools has steadily driven the problem solving act down the software stack. The result is that JVM and middleware providers have become increasingly involved in helping customers determine root cause for a wide variety of unexpected application behaviour. This trend is increasing in line with the exploitation of capabilities introduced in Java SE. For instance, Java 5.0 NIO brought managing native memory back to the table. Something most Java programmers had not had to learn before. Helping customers diagnose native out of memory situations is a common occupation for JVM and middleware providers.”

So, this JSR proposes a set of readers for common artifact types and the ability to extend those readers as needed. Also, it will include an API to standardize the means of generating in-flight artifacts.

I asked Steve Poole from IBM, the submitting member, which formats would be targeted initially and he suggested the most common would be first as expected, such as HPROF and PHD (IBM’s Portable Heap Dump) formats.

I think this is a good opportunity for standardization. Recently, we added extensive cluster visualization to Terracotta and having the ability to read external dump formats and integrate that into the stats we’re already collecting would be great. Additionally, it would be wonderful to have a standard way to export the data we collected in-flight to an external file.

I also asked Steve whether they expected JSR 326 to be completed in time for Java 7. Since the contents and schedule for Java 7 are still unknown, he wasn’t willing to hazard a guess. For now, I’ll be adding a section for JSR 326 to my Java 7 page and tracking information about it just in case.

Pure Danger Tech

JSR 326 – Post mortem JVM Diagnostics API