PerfAnal: A Performance Analysis Tool

   
     

Articles Index

PerfAnal

This article discusses PerfAnal, a GUI-based tool for analyzing the performance of applications on the Java 2 Platform. You can use PerfAnal to identify performance problems in your code and locate code that needs tuning.

The tool described in this article is a free, GPL-licensed Java tool originally published with the book Java Programming on Linux. It is included with the book's CD-ROM, and also available from the book's web site (relevant pointers are provided at the end of the article). There is nothing Linux-specific about this tool--it's usable with any implementation of the Java 2 Platform that supports the Java Virtual Machine 1 Profiling Interface.

Profiling Java Applications

Until the release of the Java 2 Platform, profiling of Java application performance was a difficult task. The JVM offered a very limited profiling feature, and profiling tools were difficult to write and hard to find.

The Java 2 Platform greatly improved the story with the introduction of the JVM Profiling Interface (JVMPI)--a set of native-language hooks that allow developers to capture profiling information from a running Java application.

Beginning with JDK1.2, the SDK included a sample profiling application, hprof, that illustrates the sort of data that can be captured through JVMPI. Although hprof is not, in itself, a capable analyzer, it provides the foundation on which you can build a performance analyzer.

Using hprof

An excellent introduction to hprof was published in the JDC Tech Tips, in an article by Stuart Halloway:

Using HPROF to Tune Performance

The article describes how to turn on JVM profiling and interpret the file it creates. We will not repeat that information here, other than to mention that you can get a useful hprof help message by running:

java -Xrunhprof:help

The Halloway article also includes a small test program, TestHprof that we will use here to illustrate PerfAnal. Here is a listing of the source, annotated with line numbers:

 1 package com.develop.demos;
                    

 2         
                    

 3 import java.io.IOException;
                    

 4       
                    

 5   public class TestHprof {
                    

 6   public static String cat = null;
                    

 7   public final static int loop=5000;
                    

 8         
                    

 9   public static void makeString() {
                    

10      cat = new String();
                    

11        for (int n=0; n<loop; n++) {
                    

12            addToCat("more");
                    

13          }    
                    

14      }
                    

15       
                    

16   public static void addToCat(String more) {
                    

17       cat = cat + more;
                    

18    }
                    

19      
                    

20   public static void makeStringInline() {
                    

21      cat = new String();
                    

22        for (int n=0; n<loop; n++) {
                    

23            cat = cat + "more";
                    

24        }
                    

25    }
                    

26   
                    

27    public static void makeStringWithLocal() {
                    

28        String tmp = new String();
                    

29        for (int n=0; n<loop; n++) {
                    

30           tmp = tmp + "more";
                    

31         }
                    

32        cat = tmp;
                    

33    }
                    

34      
                    

35    public static void makeStringWithBuffer() {
                    

36        StringBuffer sb = new StringBuffer();
                    

37        for (int n=0; n<loop; n++) {
                    

38            sb.append("more");
                    

39        }
                    

40        cat = sb.toString();
                    

41    }
                    

42    public static void main(String[] args) {
                    

43       long begin = System.currentTimeMillis();
                    

44          
                    

45          if (null !=
                    

46          System.getProperty("WaitForProfiler")) {
                    

47              System.out.println(
                    

48                   "Start your profiler,
                    

                      then press any key to begin...");
                    

49             try {
                    

50                     System.in.read();
                    

51                 }
                    

52                 catch (IOException ioe) {
                    

53         }
                    

54         }
                    

55             
                    

56        makeString();
                    

57        makeStringInline();
                    

58        makeStringWithLocal();
                    

59        makeStringWithBuffer();
                    

60         
                    

61        long end =
                    

62        System.currentTimeMillis();
                    

63        System.out.println("Total run time of "
                    

64         + (end - begin) + " milliseconds");
                    

65      }
                    

66    }
                    

                  



Here is a sample invocation of the program, with profiling (for purposes of this discussion, we'll run with green threads and no JIT; we discuss this in more detail later in the article):

java -green -Djava.compiler= -Xrunhprof:cpu=samples,depth=6,thread=y com.develop.demos.TestHprof

The resulting profiling output is 432 lines long, and can be obtained here. (Note that this is not the same output used for the Halloway article.)

Interpreting hprof output

Briefly examining the hprof output shows that understanding the performance of this application is a difficult problem. The first few lines of the ranking at the end of the output look like this. We see that most of the time is being spent in StringBuffer initialization code. To figure out which makeString methods are most efficient requires some detective work. For example, look at trace #31 in the file (which corresponds to the top entry in the above ranking).

We see that the StringBuffer initialization was performed on behalf of the makeStringInline method, so makeStringInline is responsible for 260 ticks of the profiling counter. But that's not all; makeStringInline appears in 17 different traces. Tracking down its total cost from the hprof output is tedious and difficult work.

Introducing PerfAnal

The PerfAnal tool simplifies this analysis in two ways:

 

  • It interprets the profile data to report the total cost of methods.
  • It presents its results in a graphical format to assist in your analysis.

You can obtain the PerfAnal .jar file by right-clicking here. To run it on the data we just captured:

java -jar PerfAnal.jar java.hprof.txt

link
Click to view Figure 1 full-scale

The four quadrants contain four different analyses of the data, presented with the familiar Swing JTree interface. Let's take a closer look at each quadrant.

Analyzing CPU Usage by Caller

Figure 2 shows the upper-left quadrant, which ranks and analyzes CPU usage by caller.

link
Click to view Figure 2 full-scale

This quadrant ranks the time spent in various methods, inclusive of subroutine calls. Not surprisingly, main takes the most time. We can open up branches of the tree to explore where main is spending its time (see Figure 3).

link
Click to view Figure 3 full-scale

We see that main spent 35.20% of its time in makeStringInline, 30.03% of its time in makeStringWithLocal, and so on. Examining the first method in more detail show us that makeStringInline spent most of its time in the StringBuffer constructor.

Analyzing CPU Usage by Callee

The lower left quadrant shows the same ranking we saw in the upper left quadrant, but this tree organizes the analysis by callee. It answers the question: on whose behalf is a method spending time?

link
Click to view Figure 4 full-scale

Figure 4 shows such an analysis in the lower-left quadrant. Here we ask the question: why is the program spending 93.73% of its time in the StringBuffer constructor--on whose behalf is it so busy? Exploring that branch of the tree shows us how that time is being spent on behalf of the various makeString methods.

Analyzing CPU Usage by Line Number

The upper right quadrant shows, yet again, the same ranking of CPU usage by method. Here, however, the usage is broken down by line number. Figure 5 shows the analysis.

link
Click to view Figure 5 full-scale

The analysis shows, line by line, where main (or any other method) is spending its time, inclusive of called subroutines.

Analyzing CPU Usage, Exclusive of Subroutines

Finally, the lower right quadrant answers the question: where is the CPU really spending its time? This tree ranks method usage exclusive of subroutine calls. Figure 6 shows the analysis.

We see that, when we do not include subroutine calls, line 120 of the StringBuffer constructor is where our program spends fully 79.54% of its time.

Pulling it all Together

Each of these analyses, individually, provides important information about where execution time is being spent. PerfAnal provides an additional feature that allows you to correlate the information shown in the various windows.

For example, Figure 6 showed us that 12.54% of execution time was spent in System.arraycopy. What is taking so much time? By right-clicking on that entry, we pop up a menu that allows us to correlate the information (see Figure 7).

link
Click to view Figure 7 full-scale

By choosing Goto this Method, we position all four quadrants to examine this method. Returning to the lower-left quadrant, we can explore in detail (see Figure 8) who is calling System.arraycopy.

link
Click to view Figure 8 full-scale

Other Capabilities

The right-click popup menu shown in Figure 7 offers two additional features.

If you choose Select a Method to Analyze, you are presented with a dialog (see Figure 9) that allows you to select a method by name. After you select a method, all four quadrants will be positioned to analyze that method.


Figure 9

If you choose Select Thread(s) to Analyze, you can choose one or more threads from a dialog (see Figure 10). The analysis will be recomputed to report only on the selected threads.

Saving Output

Finally, PerfAnal allows you to save the analysis data. The File menu (in the menu bar) gives you a Save... item that saves all the processed information into a textfile. The information from all four quadrants is saved, with the tree fully unfurled-- here are the first few lines of the saved file from this analysis:

JVM Dependencies

Experience with the threading model on Linux suggests that the profiling results captured with native threads cannot necessarily be trusted. I've noticed, at least for GUI applications, that much of the profiling data is charged to the wrong method running in the wrong thread: much time spent in Swing rendering is charged instead to the Motif event loop (which in fact spends most of its time waiting). This problem does not appear when running with green threads, and I do not know if it appears on other platforms.

The use of JIT will have an obvious effect on performance, but it can also affect the usefulness of the profiling output. In general, JIT-compiled code loses line number information; any methods that have been compiled will not report line numbers in the profile, so the analyses in the upper- and lower-right quadrants will not be as informative.

hprof Option Dependencies

Certain hprof options have important effects on the PerfAnal analysis:

  • cpu=samples: This option is necessary to capture the data PerfAnal needs.
  • depth=: This option controls the depth of the stack traces saved during profiling. You may need to experiment with this value: large values produce more accurate information, but increase the memory and CPU requirements of PerfAnal.

If the value you choose for stack depth is too small, PerfAnal will not be able to fully ascertain CPU usage by subroutines. Notice, for example, in Figure 4 that the times reported for the various callers add up to the time reported for the callee. If they do not add up, you have not captured enough stack depth. There is no right amount of stack depth--it just depends on how far down the call stack you need to analyze. In my own profiling, I usually start with a depth of 12 and refine it as needed.

  • thread=y: You must specify this option if you want to be able to analyze the data by thread.

Obtaining Source

Source for PerfAnal is included on the CD-ROM with Java Programming on Linux, and is also available from the book's web site.

About Java Programming on Linux

Java Programming on Linux, by Nathan Meyers, is devoted to discussing the use of Java in the Linux environment. It provides detailed information on JDKs and other development tools available on the Linux platform, discusses the platform-specific issues faced on Linux, and describes the many ways in which Java can be deployed on Linux.

In addition to PerfAnal, the book provides and discusses tools for analyzing the time Java applications spend in native code and in the kernel, and discusses Java's performance demands on CPU, memory, and the X Window System.

To learn more about the book, visit its web site.

About the Author

Nathan Meyers spent 20 years as a developer and architect for Hewlett-Packard Company, working in handheld calculators, UNIX workstations, and inkjet printers. His experience includes developing embedded systems, device driver implementation, creating development tools, definition and implementation work on the X Window System and the Common Desktop Environment, developing 2D and 3D graphics applications, UNIX application performance tuning, designing evolutionary algorithms, and implementing financial algorithms. He left HP in 1999, wrote Java Programming on Linux, and joined a dot-com startup in Portland, Oregon.

_______
1 As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.