Chapter 4

Troubleshooting System Crashes

This chapter provides information and guidance on some specific procedures for troubleshooting system crashes.

A crash, or fatal error, causes a process to terminate abnormally. There are various possible reasons for a crash. For example, a crash can occur due to a bug in the HotSpot VM, in a system library, in a Java SE library or API, in application native code, or even in the operating system. External factors, such as resource exhaustion in the operating system can also cause a crash.

Crashes caused by bugs in the HotSpot VM or in the Java SE library code are rare. This chapter provides suggestions on how to examine a crash. In some cases it is possible work around a crash until the cause of the bug is diagnosed and fixed.

In general the first step with any crash is to locate the fatal error log. This is a text file that the HotSpot VM generates in the event of a crash. See  Appendix C, Fatal Error Log for an explanation of how to locate this file, as well as a detailed description of the file.

4.1 Sample Crashes

This section presents a number of examples which demonstrate how the error log can be used to suggest the cause of a crash.

4.1.1 Determining Where the Crash Occurred

The error log header indicates the problematic frame. See  C.3 Header Format.

If the top frame type is a native frame and not one of the operating system native frames, then this indicates that the problem is likely in that native library and not in the Java virtual machine. The first step to solving this crash is to investigate the source of the native library where the crash occurred. There are three options, depending on the source of the native library.

  1. If the native library is provided by your application, then investigate the source code of your native library. The option -Xcheck:jni can help find many native bugs. See  B.2.1 -Xcheck:jni Option .

  2. If the native library has been provided by another vendor and is used by your application, then file a bug report against this third-party application and provide the fatal error log information.

  3. Determine if the native library is part of the Java runtime environment (JRE) by looking in the jre/lib or jre/bin directories in the JRE distribution. If so, file a bug report, and ensure that this library name is prominently indicated so that the bug report can be routed to the appropriate developers.

If the top frame indicated in the error log is another type of frame, file a bug report and include the fatal error log as well as any information on how to reproduce the problem.

See also the remaining sections in this chapter.

4.1.2 Crash in Native Code

If the fatal error log indicates that the crash was in a native library, there might be a bug in native code or JNI library code. The crash could of course be caused by something else, but analysis of the library and any core file or crash dump is a good starting place. For example, consider the following extract from the header of a fatal error log:

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x417789d7, pid=21139, tid=1024
#
# Java VM: Java HotSpot(TM) Server VM (6-beta2-b63 mixed mode)
# Problematic frame:
# C  [libApplication.so+0x9d7]

In this case a SIGSEGV occurred with a thread executing in the library libApplication.so.

In some cases a bug in a native library manifests itself as a crash in Java VM code. Consider the following crash where a JavaThread fails while in the _thread_in_vm state (meaning that it is executing in Java VM code) :

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x08083d77, pid=3700, tid=2896
#
# Java VM: Java HotSpot(TM) Client VM (1.5-internal mixed mode)
# Problematic frame:
# V  [jvm.dll+0x83d77]

---------------  T H R E A D  ---------------

Current thread (0x00036960):  JavaThread "main" [_thread_in_vm, id=2896]
 :
Stack: [0x00040000,0x00080000),  sp=0x0007f9f8,  free space=254k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x83d77]
C  [App.dll+0x1047]          <========= C/native frame
j  Test.foo()V+0
j  Test.main([Ljava/lang/String;)V+0
v  ~StubRoutines::call_stub
V  [jvm.dll+0x80f13]
V  [jvm.dll+0xd3842]
V  [jvm.dll+0x80de4]
V  [jvm.dll+0x87cd2]
C  [java.exe+0x14c0]
C  [java.exe+0x64cd]
C  [kernel32.dll+0x214c7]
 :

In this case the stack trace shows that a native routine in App.dll has called into the VM (probably with JNI).

If you get a crash in a native application library (as in the above examples), then you might be able to attach the native debugger to the core file or crash dump, if it is available. Depending on the operating system, the native debugger is dbx, gdb, or windbg.

Another approach is to run with the -Xcheck:jni option added to the command line (see  B.2.1 -Xcheck:jni Option ). This option is not guaranteed to find all issues with JNI code, but it can help identify a significant number of issues.

If the native library where the crash occurred is part of the Java runtime environment (for example awt.dll, net.dll, and so forth), then it is possible that you have encountered a library or API bug. If after further analysis you conclude this is a library or API bug, then gather a much data as possible and submit a bug or support call. See  Chapter 7, Submitting Bug Reports.

4.1.3 Crash due to Stack Overflow

A stack overflow in Java language code will normally result in the offending thread throwing java.lang.StackOverflowError. On the other hand, C and C++ write past the end of the stack and provoke a stack overflow. This is a fatal error which causes the process to terminate.

In the HotSpot implementation, Java methods share stack frames with C/C++ native code, namely user native code and the virtual machine itself. Java methods generate code that checks that stack space is available a fixed distance towards the end of the stack so that the native code can be called without exceeding the stack space. This distance towards the end of the stack is called “Shadow Pages.” The size of the shadow pages is between 3 and 20 pages, depending on the platform. This distance is tunable, so that applications with native code needing more than the default distance can increase the shadow page size. The option to increase shadow pages is -XX:StackShadowPages= n, where n is greater than the default stack shadow pages for the platform.

If your application gets a segmentation fault without a core file or fatal error log file (see  Appendix C, Fatal Error Log) or a STACK_OVERFLOW_ERROR on Windows or the message “An irrecoverable stack overflow has occurred,” this indicates that the value of StackShadowPages was exceeded and more space is needed.

If you increase the value of StackShadowPages, you might also need to increase the default thread stack size using the -Xssparameter. Increasing the default thread stack size might decrease the number of threads that can be created, so be careful in choosing a value for the thread stack size. The thread stack size varies by platform from 256k to 1024k.

The following is a fragment from a fatal error log, on a Windows system, where a thread has provoked a stack overflow in native code.

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  EXCEPTION_STACK_OVERFLOW (0xc00000fd) at pc=0x10001011, pid=296, tid=2940
#
# Java VM: Java HotSpot(TM) Client VM (1.6-internal mixed mode, sharing)
# Problematic frame:
# C  [App.dll+0x1011]
#

---------------  T H R E A D  ---------------

Current thread (0x000367c0):  JavaThread "main" [_thread_in_native, id=2940]
:
Stack: [0x00040000,0x00080000),  sp=0x00041000,  free space=4k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [App.dll+0x1011]
C  [App.dll+0x1020]
C  [App.dll+0x1020]
:
C  [App.dll+0x1020]
C  [App.dll+0x1020]
...<more frames>...

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  Test.foo()V+0
j  Test.main([Ljava/lang/String;)V+0
v  ~StubRoutines::call_stub

Note the following information in the above output:

  • The exception is EXCEPTION_STACK_OVERFLOW.

  • The thread state is _thread_in_native, which means that the thread is executing native or JNI code.

  • In the stack information the free space is only 4k (a single page on a Windows system). In addition, the stack pointer ( sp) is at 0x00041000, which is close to the end of the stack (0x00040000).

  • The printout of the native frames shows that a recursive native function is the issue in this case.

  • The output notation ...<more frames>... indicates that additional frames exist but were not printed. The output is limited to 100 frames.

4.1.4 Crash in the HotSpot Compiler Thread

If the fatal error log output shows that the Current thread is a JavaThread named CompilerThread0, CompilerThread1, or AdapterCompiler, then it is possible that you have encountered a compiler bug. In this case it might be necessary to temporarily work around the issue by switching the compiler (for example, by using the HotSpot Client VM instead of the HotSpot Server VM, or visa versa), or by excluding from compilation the method that provoked the crash. This is discussed in  4.2.1 Crash in HotSpot Compiler Thread or Compiled Code.

4.1.5 Crash in Compiled Code

If the crash occurred in compiled code, then it is possible that you have encountered a compiler bug that has resulted in incorrect code generation. You can recognize a crash in compiled code if the problematic frame is marked with the code J (meaning a compiled Java frame). Below is an example of a such a crash:

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  SIGSEGV (0xb) at pc=0x0000002a99eb0c10, pid=6106, tid=278546
#
# Java VM: Java HotSpot(TM) 64-Bit Server VM (1.6.0-beta-b51 mixed mode)
# Problematic frame:
# J  org.foobar.Scanner.body()V
#
:
Stack: [0x0000002aea560000,0x0000002aea660000),  sp=0x0000002aea65ddf0,
  free space=1015k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
J  org.foobar.Scanner.body()V

[error occurred during error reporting, step 120, id 0xb]

Note that a complete thread stack is not available. The output line “ error occurred during error reporting” means that a problem arose trying to obtain the stack trace (perhaps stack corruption in this example).

It might be possible to temporarily work around the issue by switching the compiler (for example, by using the HotSpot Client VM instead of the HotSpot Server VM, or visa versa) or by excluding from compilation the method that provoked the crash. In this specific example it might not be possible to switch the compiler as it was taken from the 64-bit Server VM and hence it might not be feasible to switch to the 32-bit Client VM.

4.1.6 Crash in VMThread

If the fatal log output shows that the Current thread is the VMThread, then look for the line containing VM_Operation in the THREAD section. The VMThread is a special thread in the HotSpot VM. It performs special tasks in the VM such as garbage collection (GC). If the VM_Operation suggests that the operation is a garbage collection, then it is possible that you have encountered an issue such as heap corruption.

The crash might also be a GC issue, but it could equally be something else (such as a compiler or runtime bug) that leaves object references in the heap in an inconsistent or incorrect state. In this case, collect as much information as possible about the environment and try possible workarounds. If the issue is GC-related you might be able to temporarily work around the issue by changing the GC configuration. This is discussed in  4.2.2 Crash During Garbage Collection.

4.2 Finding a Workaround

If a crash occurs with a critical application, and the crash appears to be caused by a bug in the HotSpot VM, then it might be desirable to quickly find a temporary workaround. The purpose of this section is to suggest some possible workarounds. If the crash occurs with an application that is deployed with the most recent release of Java SE, then the crash should always be reported to Sun Microsystems either by logging a support call (for customers with support contracts), by reporting a one–time–incident (see  Commercial Support for links to support options), or by submitting a bug to the bug database (see  Other Resources for the link to the bug database).


Note - Even if a workaround in this section successfully eliminates a crash, the workaround is not a fix for the problem, but merely a temporary solution. Submit a support call or bug report with the original configuration that demonstrated the issue.


4.2.1 Crash in HotSpot Compiler Thread or Compiled Code

If the fatal error log indicates that the crash occurred in a compiler thread, then it is possible (but not always the case) that you have encountered a compiler bug. Similarly, if the crash is in compiled code then it is possible that the compiler has generated incorrect code.

In the case of the HotSpot Client VM ( -client option), the compiler thread appears in the error log as CompilerThread0. With the HotSpot Server VM there are multiple compiler threads and these appear in the error log file as CompilerThread0, CompilerThread1, and AdapterThread.

Below is a fragment of an error log for a compiler bug that was encountered and fixed during the development of J2SE 5.0. The log file shows that the HotSpot Server VM is used and the crash occurred in CompilerThread1. In addition, the log file shows that the Current CompileTask was the compilation of the java.lang.Thread.setPriority method.

# An unexpected error has been detected by HotSpot Virtual Machine:
#
:
# Java VM: Java HotSpot(TM) Server VM (1.5-internal-debug mixed mode)
:
---------------  T H R E A D  ---------------

Current thread (0x001e9350): JavaThread "CompilerThread1" daemon [_thread_in_vm, id=20]

Stack: [0xb2500000,0xb2580000),  sp=0xb257e500,  free space=505k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [libjvm.so+0xc3b13c]
:

Current CompileTask:
opto: 11      java.lang.Thread.setPriority(I)V (53 bytes)

---------------  P R O C E S S  ---------------

Java Threads: ( => current thread )
  0x00229930 JavaThread "Low Memory Detector" daemon [_thread_blocked, id=21]
=>0x001e9350 JavaThread "CompilerThread1" daemon [_thread_in_vm, id=20]
 :

In this case there are two potential workarounds:

  • The brute force approach: change the configuration so that the application is run with the -client option to specify the HotSpot Client VM.

  • Assume that the bug only occurs during the compilation of the setPriority method and exclude this method from compilation.

The first approach (to use the -client option) might be trivial to configure in some environments. In others, it might be more difficult if the configuration is complex or if the command line to configure the VM is not readily accessible. In general, switching from the HotSpot Server VM to the HotSpot Client VM also reduces the peak performance of an application. Depending on the environment, this might be acceptable until the actual issue is diagnosed and fixed.

The second approach (exclude the method from compilation) requires creating the file .hotspot_compiler in the working directory of the application. Below is an example of this file:

exclude    java/lang/Thread    setPriority

In general the format of this file is exclude CLASS METHOD, where CLASS is the class (fully qualified with the package name) and METHOD is the name of the method. Constructor methods are specified as <init> and static initializers are specified as <clinit>.


Note - The .hotspot_compiler file is an unsupported interface. It is documented here solely for the purposes of troubleshooting and finding a temporary workaround.


Once the application is restarted, the compiler will not attempt to compile any of the methods listed as excluded in the .hotspot_compiler file. In some cases this can provide temporary relief until the root cause of the crash is diagnosed and the bug is fixed.

In order to verify that the HotSpot VM correctly located and processed the .hotspot_compiler file that is shown in the example above, look for the following log information at runtime. Note that the file name separator is a dot, not a slash.

### Excluding compile:    java.lang.Thread::setPriority

4.2.2 Crash During Garbage Collection

If a crash occurs during garbage collection (GC), then the fatal error log reports that a VM_Operation is in progress. For the purposes of this discussion, assume that the mostly concurrent GC ( -XX:+UseConcMarkSweep) is not in use. The VM_Operation is shown in the THREAD section of the log and indicates one of the following situations:

  • Generation collection for allocation

  • Full generation collection

  • Parallel gc failed allocation

  • Parallel gc failed permanent allocation

  • Parallel gc system gc

Most likely the current thread reported in the log is the VMThread. This is the special thread used to execute special tasks in the HotSpot VM. The following fragment of the fatal error log shows an example of a crash in the serial garbage collector:

---------------  T H R E A D  ---------------

Current thread (0x002cb720):  VMThread [id=3252]

siginfo: ExceptionCode=0xc0000005, reading address 0x00000000

Registers:
EAX=0x0000000a, EBX=0x00000001, ECX=0x00289530, EDX=0x00000000
ESP=0x02aefc2c, EBP=0x02aefc44, ESI=0x00289530, EDI=0x00289530
EIP=0x0806d17a, EFLAGS=0x00010246

Top of Stack: (sp=0x02aefc2c)
0x02aefc2c:   00289530 081641e8 00000001 0806e4b8
0x02aefc3c:   00000001 00000000 02aefc9c 0806e4c5
0x02aefc4c:   081641e8 081641c8 00000001 00289530
0x02aefc5c:   00000000 00000000 00000001 00000001
0x02aefc6c:   00000000 00000000 00000000 08072a9e
0x02aefc7c:   00000000 00000000 00000000 00035378
0x02aefc8c:   00035378 00280d88 00280d88 147fee00
0x02aefc9c:   02aefce8 0806e0f5 00000001 00289530
Instructions: (pc=0x0806d17a)
0x0806d16a:   15 08 83 3d c0 be 15 08 05 53 56 57 8b f1 75 0f
0x0806d17a:   0f be 05 00 00 00 00 83 c0 05 a3 c0 be 15 08 8b 

Stack: [0x02ab0000,0x02af0000),  sp=0x02aefc2c,  free space=255k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
V  [jvm.dll+0x6d17a]
V  [jvm.dll+0x6e4c5]
V  [jvm.dll+0x6e0f5]
V  [jvm.dll+0x71771]
V  [jvm.dll+0xfd1d3]
V  [jvm.dll+0x6cd99]
V  [jvm.dll+0x504bf]
V  [jvm.dll+0x6cf4b]
V  [jvm.dll+0x1175d5]
V  [jvm.dll+0x1170a0]
V  [jvm.dll+0x11728f]
V  [jvm.dll+0x116fd5]
C  [MSVCRT.dll+0x27fb8]
C  [kernel32.dll+0x1d33b]

VM_Operation (0x0373f71c): generation collection for allocation, mode:
 safepoint, requested by thread 0x02db7108

Note - A crash during garbage collection does not imply a bug in the garbage collection implementation. It could also indicate a compiler or runtime bug or some other issue.


You can try the following workarounds if you get a repeated crash during garbage collection:

  • Switch GC configuration. For example, if you are using the serial collector, try the throughput collector, or visa versa.

  • If you are using the HotSpot Server VM, try the HotSpot Client VM.

If you are not sure which garbage collector is in use, you can use the jmap utility on Solaris OS and Linux (see  2.7 jmap Utility ) to obtain the heap information from the core file, if the core file is available. In general if the GC configuration is not specified on the command line, then the serial collector will be used on Windows. On Solaris OS and Linux it depends on the machine configuration. If the machine has at least 2GB of memory and has at least 2 processors, then the throughput collector (Parallel GC) will be used. For smaller machines the serial collector is the default. The option to select the serial collector is -XX:+UseSerialGC and the option to select the throughput collector is -XX:+UseParallelGC. If, as a workaround, you switch from the throughput collector to the serial collector, then you might experience some performance degradation on multi-processor systems. This might be acceptable until the root issue is diagnosed and resolved.

4.2.3 Class Data Sharing

Class data sharing was a new feature in J2SE 5.0. When the JRE is installed on 32-bit platforms using the Sun-provided installer, the installer loads a set of classes from the system JAR file into a private internal representation and dumps that representation to a file called a shared archive. When the VM is started, the shared archive is memory-mapped in. This saves on class loading and allows much of the metadata associated with the classes to be shared across multiple VM instances. In J2SE 5.0, class data sharing is enabled only when the HotSpot Client VM is used. In addition, sharing is supported only with the serial garbage collector.

The fatal error log prints the version string in the header of the log. If sharing is enabled, it is indicated by the text sharing, as shown in the following example:

# An unexpected error has been detected by HotSpot Virtual Machine:
#
#  EXCEPTION_ACCESS_VIOLATION (0xc0000005) at pc=0x08083d77, pid=3572, tid=784
#
# Java VM: Java HotSpot(TM) Client VM (1.5-internal mixed mode, sharing)
# Problematic frame:
# V  [jvm.dll+0x83d77]

Sharing can be disabled by providing the -Xshare:off option on the command line. If the crash cannot be duplicated with sharing disabled but can be duplicated with sharing enabled, then it is possible that you have encountered a bug in this feature. In that case gather as much information as possible and submit a bug report.

4.3 Microsoft Visual C++ Version Considerations

The JDK 6 software is compiled on Windows using Microsoft Visual Studio .NET 2003 (Professional) for 32–bit platforms and Windows Server 2003 SP1 Platform SDK - April 2005 Edition for 64-bit platforms. If you experience a crash with a Java SE application and if you have native or JNI libraries that are compiled with a different release of the compiler, then you must consider compatibility issues between the runtimes. Specifically, your environment is supported only if you follow the Microsoft guidelines when dealing with multiple runtimes. For example, if you allocate memory using one runtime, then you must release it using the same runtime. Unpredictable behavior or crashes can arise if you release a resource using a different library than the one that allocated the resource.