This FAQ answers common questions about Java HotSpot Technology and about performance in general. Unless otherwise noted, all information on this page applies to both the HotSpot Client VM and the HotSpot Server VM as of Java SE version 1.4.
See the document Java HotSpot VM Options
For more in-depth troubleshooting discussion beyond the scope of this FAQ, please see the Java Trouble-Shooting and Diagnostic Guide [PDF]
First, make sure you are running with -agentlib:hprof and try -agentlib:hprof=help to see the different kinds of profiling available. If you are still having problems please see the Java Trouble-Shooting and Diagnostic Guide [PDF]
Certain applications will use a lot of file descriptors. The only thing that you can do is to set the number of file descriptors allowed on the system higher. The hard limit default is 1024 and the soft limit default is 64. To set this higher you need to modify /etc/system by adding the following 2 definitions:
set rlim_fd_max = 4096 set rlim_fd_cur = 4096
There are several things to try in this arena. First, give -Xincgc
a try. This uses the incremental garbage collection algorithm, which attempts to collect a fraction of the heap instead of the entire thing at once. For most programs this results in shorter pauses, although throughput is usually worse.
Next, you might try decreasing the amount of heap used. A larger heap will cause garbage collection pauses to increase because there is more heap to scan. Try -Xmx32m.
If your application requires more memory than you can adjust the size of the eden (young generation space) with -XX:NewSize=...
and -XX:MaxNewSize=...
(for 1.3/1.4) or -Xmn
in 1.4 and later. For some applications a very large eden helps, for others it will increase the times of minor collections. For most programs, collecting eden is much faster than other generations because most objects die young. If you currently invoke with something like:
-Xms260m -Xmx260m
-Xms384m -Xmx384m -XX:NewSize=128m -XX:MaxNewSize=128m
which will dedicate 1/3rd of the memory to eden. For 1.3, MaxNewSize is set to 32mb on Sparc, 2.5mb on Intel based machines. NewRatio (the ratio between the young/old generations) has values of 2 on Sparc Server, 12 on client Intel, and 8 everywhere else, as you can quickly determine, this is superseded by MaxNewSize's defaults (rendering NewRatio ineffective for even moderately sized heaps). In 1.4 and later, MaxNewSize has been effectively set to infinity, and NewRatio can be used instead to set the value of the new generation. Using the above as an example, you can do the following in 1.4 and later:
-Xms384m -Xmx384m -XX:NewRatio=2
If you are worried about the number of garbage collections, but less worried about pause times, then increasing the heap should cause the number of full garbage collections to decrease, this is especially true if you increase the size of the eden space as well.
Many systems have less efficient memory management than in HotSpot. To work around this, some programs keep an "object pool", saving previously allocated objects in some freelist-like data structure and reusing them instead of allocating new ones. But... Don't use object pools! Using object pools will fool the collector into thinking objects are live when they really aren't. This may have worked before exact garbage collection became popular, but this is just not a good idea for any modern Java Virtual Machines.
See also Tuning Garbage Collection with the 5.0 Java Virtual Machine .
Try using -Xaprof to get a profile of the allocations (objects and sizes) of your application.
You can also try -agentlib:hprof=heap=all (or other option, try -agentlib:hprof=help for a list)
Since 1.5, you can use jmap. For more information, see jmap - Memory Map and Java Trouble-Shooting and Diagnostic Guide [PDF]
The Java HotSpot VM cannot expand its heap size if memory is completely allocated and no swap space is available. This can occur, for example, when several applications are running simultaneously. When this happens, the VM will exit after printing a message similar to the following.
Exception java.lang.OutOfMemoryError: requested <size> bytes
-Xmx-Xms-Xmx
For more information, see the evaluation section of bug 4697804.
The maximum theoretical heap limit for the 32-bit JVM is 4G. Due to various additional constraints such as available swap, kernel address space usage, memory fragmentation, and VM overhead, in practice the limit can be much lower. On most modern 32-bit Windows systems the maximum heap size will range from 1.4G to 1.6G. On 32-bit Solaris kernels the address space is limited to 2G. On 64-bit operating systems running the 32-bit VM, the max heap size can be higher, approaching 4G on many Solaris systems.
As of Java SE 6, the Windows /3GB boot.ini feature is not supported.
If your application requires a very large heap you should use a 64-bit VM on a version of the operating system that supports 64-bit applications. See Java SE Supported System Configurations for details.
The answer is No!
Pooling objects will cause them to live longer than necessary. The garbage collection methods will be much more efficient if you let it do the memory management. We strongly advise taking out object pools.
Don't call System.gc(), HotSpot will make the determination of when its appropriate and will generally do a much better job. If you are having problems with the pause times for garbage collection or it taking too long, then see the pause time question above.
See also Tuning Garbage Collection with the 5.0 Java Virtual Machine .
Starting with 1.3.1, softly reachable objects will remain alive for some amount of time after the last time they were referenced. The default value is one second of lifetime per free megabyte in the heap. This value can be adjusted using the -XX:SoftRefLRUPolicyMSPerMB
flag, which accepts integer values representing milliseconds. For example, to change the value from one second to 2.5 seconds, use this flag:
-XX:SoftRefLRUPolicyMSPerMB=2500
The Java HotSpot Server VM uses the maximum possible heap size (as set with the -Xmx
option) to calculate free space remaining.
The Java Hotspot Client VM uses the current heap size to calculate the free space.
This means that the general tendency is for the Server VM to grow the heap rather than flush soft references, and -Xmx
therefore has a significant effect on when soft references are garbage collected.
On the other hand, the Client VM will have a greater tendency to flush soft references rather than grow the heap.
The behavior described above is true for 1.3.1 through Java SE 6 versions of the Java HotSpot VMs. This behavior is not part of the VM specification, however, and is subject to change in future releases. Likewise the -XX:SoftRefLRUPolicyMSPerMB
flag is not guaranteed to be present in any given release.
Prior to version 1.3.1, the Java HotSpot VMs cleared soft references whenever it found them.
If you're using RMI, then you could be running into distributed GC. Also, some applications are adding explicit GC's thinking that it will make their application faster. Luckily, you can disable this with a command line option in 1.3 and later. Try -XX:+DisableExplicitGC
along with -verbose:gc
and see if this helps.
These two systems are different binaries. They are essentially two different compilers (JITs)interfacing to the same runtime system. The client system is optimal for applications which need fast startup times or small footprints, the server system is optimal for applications where the overall performance is most important. In general the client system is better suited for interactive applications such as GUIs. Some of the other differences include the compilation policy,heap defaults, and inlining policy.
Client and server systems are both downloaded with the 32-bit Solaris and Linux downloads. For 32-bit Windows, if you download the JRE, you get only the client, you'll need to download the SDK to get both systems.
For 64-bit, only the server system is included. On Solaris, the 64-bit JRE is an overlay on top of the 32-bit distribution. However, on Linux and Windows, it's a completely separate distribution.
Since Java SE 5.0, with the exception of 32-bit Windows, the server VM will automatically be selected on server-class machines. The definition of a server-class machine may change from release to release, so please check the appropriate ergonomics document for the definition for your release. For 5.0, it's Ergonomics in the 5.0 Java[tm] Virtual Machine.
Warming up loops for HotSpot is not necessary. HotSpot contains On Stack Replacement technology which will compile a running (interpreted) method and replace it while it is still running in a loop. No need to waste your applications time warming up seemingly infinite (or very long running) loops in order to get better application performance.
What it is: A 64-bit version of Java has been available to Solaris SPARC users since the 1.4.0 release of J2SE. A 64-bit capable J2SE is an implementation of the Java SDK (and the JRE along with it) that runs in the 64-bit environment of a 64-bit OS on a 64-bit processor. You can think of this environment as being just another platform to which we've ported the SDK. The primary advantage of running Java in a 64-bit environment is the larger address space. This allows for a much larger Java heap size and an increased maximum number of Java Threads, which is needed for certain kinds of large or long-running applications. The primary complication in doing such a port is that the sizes of some native data types are changed. Not surprisingly the size of pointers is increased to 64 bits. On Solaris and most Unix platforms, the size of the C language long is also increased to 64 bits. Any native code in the 32-bit SDK implementation that relied on the old sizes of these data types is likely to require updating.
Within the parts of the SDK written in Java things are simpler, since Java specifies the sizes of its primitive data types precisely. However even some Java code needs updating, such as when a Java int is used to store a value passed to it from a part of the implementation written in C.
What it is NOT: Many Java users and developers assume that a 64-bit implementation means that many of the built-in Java types are doubled in size from 32 to 64. This is not true. We did not increase the size of Java integers from 32 to 64 and since Java longs were already 64 bits wide, they didn't need updating. Array indexes, which are defined in the Java Virtual Machine Specification, are not widened from 32 to 64. We were extremely careful during the creation of the first 64-bit Java port to insure Java binary and API compatibility so all existing 100% pure Java programs would continue running just as they do under a 32-bit VM.
In order to run a 64-bit version of Java you must have a processor and operating system that can support the execution of 64-bit applications. The tables below list the supported 64-bit operating systems and CPUs for J2SE 1.4.2 and Java SE 5.0.
J2SE 1.4.2 Releases
PROCESSORS | OPERATING SYSTEMS | ||
---|---|---|---|
SOLARIS | WINDOWS Server 2003
WINDOWS XP 64-bit Edition |
LINUX | |
SPARC | YES | NO | NO |
IA64 - Itanium | NO | YES | YES |
AMD64 & EM64T | YES (1.4.2_11) | YES (1.4.2_11) | NO |
Java SE 5.0 Releases
PROCESSORS | OPERATING SYSTEMS | ||
---|---|---|---|
SOLARIS | WINDOWS Server 2003
WINDOWS XP 64-bit Edition |
LINUX | |
SPARC | YES | NO | NO |
IA64 - Itanium | NO | NO | NO |
AMD64 & EM64T | YES (5.0_02) | YES | YES |
For the Solaris 64-bit packages, you must first install the 32-bit SDKor JRE and then select and install the 64-bit package on top of the 32-bit version. For all other platforms, you need onlyinterested in.
select and install the 64-bit package in which you are
How do I select between 32 and 64-bit operation? What's the default? The options -d32 and -d64 have been added to the Java launcher to specify whether the program is to be run in a 32 or 64-bit environment. On Solaris these correspond to the ILP32 and LP64 data models, respectively. Since Solaris has both a 32 and 64-bit J2SE implementation contained within the same installation of Java, you can specify either version. If neither -d32 nor -d64 is specified, the default is to run in a 32-bit environment.
Other Java commands (javac, javadoc, etc.) will rarely need to be executed in a 64-bit environment. However, the -d32/-d64 options may be passed to these commands and then on to the Java launcher using the established -J prefix option (eg: -J-d64).
All other platforms (Windows and Linux) contain separate 32 and 64-bit installation packages. If both packages are installed on a system, you select one or the other by adding the appropriate "bin" directory to your path. For consistency, the Java implementations on Linux accept the -d64 option.
Currently only the Java HotSpot Server VM supports 64-bit operation, and the -server option is implicit with the use of -d64. This is subject to change in a future release.
The Java Plug-in, AWT Robot and Java Web Start currently do not support 64-bit operation. Use 32-bit Java if you require these features.
There are no changes to public native interfaces (JNI, the AWT Native Interface, JPDA) for 64-bit development. 64-bit versions of all public native libraries are provided, with the 'sparc' component of the pathnames replaced by 'sparcv9'. Any 64-bit native code that links against a native library must use the 64-bit version of that library, using the LD_LIBRARY_PATH[_64] environment variables or some other mechanism.
When porting 32-bit native code to 64-bit Java platforms, you will need to modify you code to be 64-bit clean. This involves examining your C/C+ + code and looking for code that assumes the size of a pointer to be 4 bytes or that a pointer can be cast and stored in an integer. Long data types are also troublesome when porting 32-bit code. You should avoid the use of longs if at all possible since longs have different sizes on different operating systems even in 64-bit. Windows 64-bit platforms define longs to be 4 bytes but most Unix operating systems specify that longs are 8 bytes in size. For more details, refer to the links below under learning more about 64-bit programming.
There's no public API that allows you to distinguish between 32 and 64-bit operation. Think of 64-bit as just another platform in the write once, run anywhere tradition. However, if you'd like to write code which is platform specific (shame on you), the system property sun.arch.data.model has the value "32", "64", or "unknown".
No. All native binary code that was written for a 32-bit VM must be recompiled for use in a 64-bit VM. All currently supported operating systems do not allow the mixing of 32 and 64-bit binaries or libraries within a single process. You can run a 32-bit Java process on the same system as a 64-bit Java process but you cannot mix 32 and 64-bit native libraries.
See the Solaris 64-bit Developer's Guide. The section on converting applications to 64-bit is especially useful.
Generally, the benefits of being able to address larger amounts of memory come with a small performance loss in 64-bit VMs versus running the same application on a 32-bit VM. This is due to the fact that every native pointer in the system takes up 8 bytes instead of 4. The loading of this extra data has an impact on memory usage which translates to slightly slower execution depending on how many pointers get loaded during the execution of your Java program. The good news is that with AMD64 and EM64T platforms running in 64-bit mode, the Java VM gets some additional registers which it can use to generate more efficient native instruction sequences. These extra registers increase performance to the point where there is often no performance loss at all when comparing 32 to 64-bit execution speed.
The performance difference comparing an application running on a 64-bit platform versus a 32-bit platform on SPARC is on the order of 10-20% degradation when you move to a 64-bit VM. On AMD64 and EM64T platforms this difference ranges from 0-15% depending on the amount of pointer accessing your application performs.
The default heap size for all 32-bit J2SE implementations is 64MB. We have adjusted the defaults for 64-bit implementations to be 30% larger in order to make up for the increased size of Java objects due to larger native pointers. Remember that Java objects contain class and lock pointers so even if you create Java objects which contain only integers, each object takes up additional memory.
The major advantage of a 64-bit Java implementation is to be able to create and use more Java objects. It is great to be able to break these 2GB limits. Remember, however, that this additional heap must be garbage collected at various points in your application's life span. This additional garbage collection can cause large pauses in your Java application if you do not take this into consideration. The Hotspot VM has a number of garbage collection implementations which are targetted at Java applications with large heaps. We recommend enabling one of the Parallel or Concurrent garbage collectors when running with very large heaps. These collectors attempt to minimize the overhead of collection time by either collecting garbage concurrent with the execution of your Java application or by utilizing multiple CPUs during collections to ge the job done faster. For more information on these garbage collection modes and how to select them please refer to the Hotspot GC tuning guide which can be found here: Tuning Garbage Collection with the 5.0 Java Virtual Machine
There is an undocumented option, -Xconcurrentio, which generally helps programs with many threads, particularly on Solaris. The main feature turned on with -Xconcurrentio is to use LWP based synchronization instead of thread based synchronization. We have found certain applications to speed up by over 40%. Since 1.4, LWP based synchronization is the default, but -Xconcurrentio can still help since it turns on some other internal options. Finally, there is an alternate thread library which is the default on Solaris 9 and can also be used on Solaris 8 by changing your LD_LIBRARY_PATH to include /usr/lib/lwp before /usr/lib.
You may be running into a problem with the default stack size for threads. In Java SE 6, the default on Sparc is 512k in the 32-bit VM, and 1024k in the 64-bit VM. On x86 Solaris/Linux it is 320k in the 32-bit VM and 1024k in the 64-bit VM.
On Windows, the default thread stack size is read from the binary (java.exe). As of Java SE 6, this value is 320k in the 32-bit VM and 1024k in the 64-bit VM.
You can reduce your stack size by running with the -Xss option. For example:
java -server -Xss64k
Note that on some versions of Windows, the OS may round up thread stack sizes using very coarse granularity. If the requested size is less than the default size by 1K or more, the stack size is rounded up to the default; otherwise, the stack size is rounded up to a multiple of 1 MB.
64k is the least amount of stack space allowed per thread.
Solaris 8 provides 2 versions of the threading library libthread.so. The default is the T1 library which uses a M:N threading model. The other is the T2 library which uses a 1:1 model.
For stability & performance reasons, usage of the T2 thread library is strongly recommended with HotSpot. For more information, please see the following documents:
Link 1
The T2 library is the default in Solaris 9 and later.
On Solaris 8 you need to add /usr/lib/lwp
to your LD_LIBRARY_PATH
. On Solaris 9 (or higher) the one-to-one model is the default and you do not need to do anything.
Please see the document Thread Priority on the Solaris Platform.
First, it is important to understand the percentage of time that the application is running bytecodes. If the program is I/O bound or running in native methods, then the VM is not involved in the consumption of CPU time. The VM technology will only speed up the time spent running in bytecode. Typical examples of time spent not running bytecode are graphical operations that make heavy use of native methods, and I/O operations such as reading and writing data to network sockets or database files
Assuming that the VM is mostly executing bytecode, ensure that the VM is in the correct mode. For applications where small footprint and fast startup are important, use -client. For applications where overall performance is the most important, use -server.
If the above does not address the performance issue, read on for more tuning parameters you can try, and also see Java HotSpot VM Options. There are also tools available such as jstat (Java Virtual Machine Statistics Monitoring Tool) and hprof (A Heap/CPU Profiling Tool) that can assist in diagnosing application performance issues.
Scaling problems could be a multitude of things. First, your application may not be written in a scalable manner (if you use a lot of synchronization, for one example, or if you have only one thread, as another). It may also be that you are utilizing OS system resources which are not scalable. Finally, if you have many threads, it may be that garbage collection is getting in the way.
There are practical limitations to scalability, and often garbage collection will be a bottleneck when large numbers of processors are employed. Scalability is a top priority for our development team. Currently we run applications on very large systems and we see throughput improvements for those applications which are written in a scalable way.
If you're blocked doing I/O, then no matter which version of java you use you will not be able to speed this up. If your application is using many threads you may be encountering scalability issues.
Oracle provides two types of database drivers: a type-2 driver, called the OCI (Oracle Call Interface) driver that utilizes native code, and a type-4 pure Java driver called the thin driver. In single processor environments, the thin driver works somewhat better than the OCI driver because of the JNI overhead associated with the OCI driver. On multi-processor configurations, synchronization points within Solaris used by the OCI driver become big bottlenecks and prevent scaling. One way to resolve the sync issue is to use libumem on Solaris. Otherwise, we recommend using the thin driver.
Here's my program:
public class Benchmark {
public static void main(String[] arg) {
long before = System.currentTimeMillis();
int sum = 0;
for (int index = 0; index < 10*1000*1000; index += 1) {
sum += index;
}
long after = System.currentTimeMillis();
System.out.println("Elapsed time: " +
Long.toString(after - before) +
" milliseconds");
}
}
You are writing a microbenchmark.
Remember how HotSpot works. It starts by running your program with an interpreter. When it discovers that some method is "hot" -- that is, executed a lot, either because it is called a lot or because it contains loops that loop a lot -- it sends that method off to be compiled. After that one of two things will happen, either the next time the method is called the compiled version will be invoked (instead of the interpreted version) or the currently long running loop will be replaced, while still running, with the compiled method. The latter is known as "on stack replacement", or OSR.
In the meantime, if you insist on using/writing microbenchmarks like this, you can work around the problem by moving the body of main to a new method and calling it once from main to give the compiler a chance to compile the code, then calling it again in the timing bracket to see how fast HotSpot is.
See also the JavaOne 2002 presentation S-1816 How NOT To Write A Microbenchmark
I'm trying to time method invocation time. I don't want there to be any extra work done, so I'm using an empty method. But when I run with HotSpot I get times that are unbelievably fast. Here's my code:
public class EmptyMethod {
public static void method() {
}
public static void runTest() {
long before;
long after;
// First, figure out the time for an empty loop
before = System.currentTimeMillis();
for (int index = 0; index < 1*1000*1000; index += 1) {
}
after = System.currentTimeMillis();
long loopTime = after - before;
System.out.println("Loop time: " +
Long.toString(loopTime) +
" milliseconds");
// Then time the method call in the loop
before = System.currentTimeMillis();
for (int index = 0; index < 1*1000*1000; index += 1) {
method();
}
after = System.currentTimeMillis();
long methodTime = after - before;
System.out.println("Method time: " +
Long.toString(methodTime) +
" milliseconds");
System.out.println("Method time - Loop time: " +
Long.toString(methodTime - loopTime) +
" milliseconds");
}
public static void main(String[] arg) {
// Warm up the virtual machine, and time it
runTest();
runTest();
runTest();
}
}
Empty methods don't count. And you are also seeing that generated code is sensitive to alignment.
The call to the empty method is being inlined away, so there really is no call there to time. Small methods will be inlined by the compiler at their call sites. This reduces the overhead of calls to small methods. This is particularly helpful for the accessor methods use to provide data abstraction. If the method is actually empty, the inlining completely removes the call.
Code is generated into memory and executed from there. The way the code is laid out in memory makes a big difference in the way it executes. In this example on my machine, the loop that claims to call the method is better aligned and so runs faster than the loop that's trying to figure out how long it takes to run an empty loop, so I get negative numbers for methodTime-loopTime.
Okay, so I'll put some random code in the body of the method so it's not empty and the inlining can't just remove it. Here's my new method (and the call site is changed to call method(17)):
public static void method(int arg) {
int value = arg + 25;
}
The HotSpot compiler is smart enough not to generate code for dead variables.
In the method above, the local variable is never used, so there's no reason to compute its value. So then the method body is empty again and when the code gets compiled (and inlined, because we removed enough code to make it small enough for inlining) it turns into an empty method again.
This can be surprising to people not used to dealing with optimizing compilers, because they can be fairly clever about discovering and eliminating dead code. They can occasionally be fairly stupid about it, so don't count on the compiler to do arbitrary optimizations of your code.
Dead code elimination also extends to control flow. If the compiler can see that a particular "variable" is in fact a constant at a test, it may choose not to compile code for the branch that will never be executed. This makes it tricky to make microbenchmarks "tricky enough" to actually time what you think you are timing.
Dead code elimination is quite useful in real code. Not that people intentionally write dead code; but often the compiler discovers dead code due to inlining where constants (e.g., actual parameters to methods) replace variables, making certain control flows dead.
I'm trying to benchmark object allocation and garbage collection. So I have harness like the one above, but the body of the method is:
public static void method() {
Object o = new Object();
}
That's the optimal case for the HotSpot storage manager. You will get numbers that are unrealistically good.
You are allocating objects that need no initialization and dropping them on the floor instantly. (No, the compiler is not smart enough to optimize away the allocation.) Real programs do allocate a fair number of short-lived temporary objects, but they also hold on to some objects for longer than this simple test program. The HotSpot storage manager does more work for the objects that are retained for longer, so beware of trying to scale up numbers from tests like this to real systems.
Graphics programs spend a lot of their time in native libraries.
The overall performance of a Java application depends on four factors:
The virtual machine is responsible for byte code execution, storage allocation, thread synchronization, etc. Running with the virtual machine are native code libraries that handle input and output through the operating system, especially graphics operations through the window system. Programs that spend significant portions of their time in those native code libraries will not see their performance on HotSpot improved as much as programs that spend most of their time executing byte codes.
This observation about native code applies to other native libraries or any native code libraries that you happen to use with your application.
The best answer here is to use real applications for benchmarking, as they are the only thing that makes a real difference. If that's not possible, use standard SPEC benchmarks followed by other well respected industry benchmarks. Microbenchmarks should be avoided, or at least used with much caution. It's very common for microbenchmarks to give misleading answers due to optimization effects.
We like to use the SPECjbb2005 benchmark. We use it for tracking our own progress over time, and we use it for comparing ourselves to other virtual machines.
SPECjbb2005 (Java Server Benchmark) is SPEC's benchmark for evaluating the performance of server side Java. Like its predecessor, SPECjbb2000, SPECjbb2005 evaluates the performance of server side Java by emulating a three-tier client/server system (with emphasis on the middle tier). The benchmark exercises the implementations of the JVM (Java Virtual Machine), JIT (Just-In-Time) compiler, garbage collection, threads and some aspects of the operating system. It also measures the performance of CPUs, caches, memory hierarchy and the scalability of shared memory processors (SMPs). SPECjbb2005 provides a new enhanced workload, implemented in a more object-oriented manner to reflect how real-world applications are designed and introduces new features such as XML processing and BigDecimal computations to make the benchmark a more realistic reflection of today's applications.
The SPECjbb2005 benchmark is available from http://www.spec.org/jbb2005/.