This article discusses
PerfAnal, a GUI-based tool for analyzing the performance of applications on the Java 2 Platform. You can use
PerfAnal to identify performance problems in your code and locate code that needs tuning.
The tool described in this article is a free, GPL-licensed Java tool originally published with the book Java Programming on Linux. It is included with the book's CD-ROM, and also available from the book's web site (relevant pointers are provided at the end of the article). There is nothing Linux-specific about this tool--it's usable with any implementation of the Java 2 Platform that supports the Java Virtual Machine 1 Profiling Interface.
Until the release of the Java 2 Platform, profiling of Java application performance was a difficult task. The JVM offered a very limited profiling feature, and profiling tools were difficult to write and hard to find.
The Java 2 Platform greatly improved the story with the introduction of the JVM Profiling Interface (JVMPI)--a set of native-language hooks that allow developers to capture profiling information from a running Java application.
Beginning with JDK1.2, the SDK included a sample profiling application,
hprof, that illustrates the sort of data that can be captured through JVMPI. Although
hprof is not, in itself, a capable analyzer, it provides the foundation on which you can build a performance analyzer.
An excellent introduction to
hprof was published in the JDC Tech Tips, in an article by Stuart Halloway:
The article describes how to turn on JVM profiling and interpret the file it creates. We will not repeat that information here, other than to mention that you can get a useful
hprof help message by running:
The Halloway article also includes a small test program,
TestHprof that we will use here to illustrate
PerfAnal. Here is a listing of the source, annotated with line numbers:
1 package com.develop.demos;
Here is a sample invocation of the program, with profiling (for purposes of this discussion, we'll run with green threads and no JIT; we discuss this in more detail later in the article):
java -green -Djava.compiler= -Xrunhprof:cpu=samples,depth=6,thread=y com.develop.demos.TestHprof
The resulting profiling output is 432 lines long, and can be obtained here. (Note that this is not the same output used for the Halloway article.)
Briefly examining the
hprof output shows that understanding the performance of this application is a difficult problem. The first few lines of the ranking at the end of the output look like
this. We see that most of the time is being spent in
StringBuffer initialization code. To figure out which
makeString methods are most efficient requires some detective work. For example,
look at trace #31 in the file (which corresponds to the top entry in the above ranking).
We see that the
StringBuffer initialization was performed on behalf of the
makeStringInline method, so
makeStringInline is responsible for 260 ticks of the profiling counter. But that's not all;
makeStringInline appears in 17 different traces. Tracking down its total cost from the
hprof output is tedious and difficult work.
PerfAnal tool simplifies this analysis in two ways:
You can obtain the
PerfAnal .jar file by
here. To run it on the data we just captured:
java -jar PerfAnal.jar java.hprof.txt
The four quadrants contain four different analyses of the data, presented with the familiar Swing
JTree interface. Let's take a closer look at each quadrant.
Figure 2 shows the upper-left quadrant, which ranks and analyzes CPU usage by caller.
This quadrant ranks the time spent in various methods,
inclusive of subroutine calls. Not surprisingly,
main takes the most time. We can open up branches of the tree to explore where
main is spending its time (see Figure 3).
We see that
main spent 35.20% of its time in
makeStringInline, 30.03% of its time in
makeStringWithLocal, and so on. Examining the first method in more detail show us that
makeStringInline spent most of its time in the
The lower left quadrant shows the same ranking we saw in the upper left quadrant, but this tree organizes the analysis by callee. It answers the question: on whose behalf is a method spending time?
Click to view Figure 4 full-scale
Figure 4 shows such an analysis in the lower-left quadrant. Here we ask the question: why is the program spending 93.73% of its time in the
StringBuffer constructor--on whose behalf is it so busy? Exploring that branch of the tree shows us how that time is being spent on behalf of the various
The upper right quadrant shows, yet again, the same ranking of CPU usage by method. Here, however, the usage is broken down by line number. Figure 5 shows the analysis.
The analysis shows, line by line, where
main (or any other method) is spending its time, inclusive of called subroutines.
Finally, the lower right quadrant answers the question: where is the CPU really spending its time? This tree ranks method usage exclusive of subroutine calls. Figure 6 shows the analysis.
We see that, when we do not include subroutine calls, line 120 of the
StringBuffer constructor is where our program spends fully 79.54% of its time.
Each of these analyses, individually, provides important information about where execution time is being spent.
PerfAnal provides an additional feature that allows you to correlate the information shown in the various windows.
For example, Figure 6 showed us that 12.54% of execution time was spent in
System.arraycopy. What is taking so much time? By right-clicking on that entry, we pop up a menu that allows us to correlate the information (see Figure 7).
Goto this Method, we position all four quadrants to examine this method. Returning to the lower-left quadrant, we can explore in detail (see Figure 8) who is calling
The right-click popup menu shown in Figure 7 offers two additional features.
If you choose
Select a Method to Analyze, you are presented with a dialog (see Figure 9) that allows you to select a method by name. After you select a method, all four quadrants will be positioned to analyze that method.
If you choose
Select Thread(s) to Analyze, you can choose one or more threads from a dialog (see Figure 10). The analysis will be recomputed to report only on the selected threads.
PerfAnal allows you to save the analysis data. The
File menu (in the menu bar) gives you a
Save... item that saves all the processed information into a textfile. The information from all four quadrants is saved, with the tree fully unfurled--
here are the first few lines of the saved file from this analysis:
Experience with the threading model on Linux suggests that the profiling results captured with native threads cannot necessarily be trusted. I've noticed, at least for GUI applications, that much of the profiling data is charged to the wrong method running in the wrong thread: much time spent in Swing rendering is charged instead to the Motif event loop (which in fact spends most of its time waiting). This problem does not appear when running with green threads, and I do not know if it appears on other platforms.
The use of JIT will have an obvious effect on performance, but it can also affect the usefulness of the profiling output. In general, JIT-compiled code loses line number information; any methods that have been compiled will not report line numbers in the profile, so the analyses in the upper- and lower-right quadrants will not be as informative.
hprof options have important effects on the
cpu=samples: This option is necessary to capture the data
depth=: This option controls the depth of the stack traces saved during profiling. You may need to experiment with this value: large values produce more accurate information, but increase the memory and CPU requirements of
If the value you choose for stack depth is too small,
PerfAnal will not be able to fully ascertain CPU usage by subroutines. Notice, for example, in Figure 4 that the times reported for the various callers add up to the time reported for the callee. If they do not add up, you have not captured enough stack depth. There is no
right amount of stack depth--it just depends on how far down the call stack you need to analyze. In my own profiling, I usually start with a depth of 12 and refine it as needed.
thread=y: You must specify this option if you want to be able to analyze the data by thread.
PerfAnal is included on the CD-ROM with
Java Programming on Linux, and is also available from the book's
Java Programming on Linux, by Nathan Meyers, is devoted to discussing the use of Java in the Linux environment. It provides detailed information on JDKs and other development tools available on the Linux platform, discusses the platform-specific issues faced on Linux, and describes the many ways in which Java can be deployed on Linux.
In addition to
PerfAnal, the book provides and discusses tools for analyzing the time Java applications spend in native code and in the kernel, and discusses Java's performance demands on CPU, memory, and the X Window System.
To learn more about the book, visit its web site.
Nathan Meyers spent 20 years as a developer and architect for Hewlett-Packard Company, working in handheld calculators, UNIX workstations, and inkjet printers. His experience includes developing embedded systems, device driver implementation, creating development tools, definition and implementation work on the X Window System and the Common Desktop Environment, developing 2D and 3D graphics applications, UNIX application performance tuning, designing evolutionary algorithms, and implementing financial algorithms. He left HP in 1999, wrote Java Programming on Linux, and joined a dot-com startup in Portland, Oregon.