Enabling Java-based VoIP backend platforms through JVM performance tuning
Pages: 1, 2, 3

Optimization of Garbage Collector Settings

Tuning of the garbage collection is a nontrivial task as predicting the influence of changing the available parameters is not always possible. White papers on the subject are available (see References) but detailed (and experimental) results of tuning and the actual influence on the garbage collection are lacking. Therefore a test suite was developed to evaluate the possible optimizations and expose problems that may occur when setting different tuning parameters. This section explores this tuning and also looks at optimizations targeted to the characteristics of SIP signaling.

Evaluation details

The test suite consists of an application capable of generating a certain number of objects, each with a configurable lifetime and a configurable size in memory. This application was run repeatedly with 72 different combinations of virtual machine tuning options that include the sizing of the heap, the sizing of the generations, the choice of garbage collector, and combinations of these. We chose the tuning parameters based on experience and elimination of irrelevant parameter combinations. We observed and analyzed garbage collection behavior using a custom application based on the open source gcviewer application, which allowed us to determine the exact influence of the options on the garbage collection.

Regarding the virtual machine tuning, two important optimizations are possible: a minimization of the total execution time or a minimization of the length of the pauses caused by the garbage collection. Another option may be to minimize the total time spent doing garbage collection. In most cases this will be roughly equivalent to the minimization of the total execution time. However, test results show this is not always the case and the total execution time may be longer, although the actual time spent in garbage collection is shorter. This situation occurs when setting the heap size to a fixed or variable size, as illustrated in Figures 4 and 5. Looking at these illustrations, if you take the single CPU case, then even though a fixed heap minimized the garbage collection time (10 seconds for fixed, 17 for variable), it also increased the total execution time (405 seconds versus 403).

Figure 4
Figure 4. Total garbage collection time

Figure 5
Figure 5. Total execution time

Note that the test suite is not CPU-bound but allocates objects at specified intervals. This means the dual CPU configuration will benefit during the garbage collection but no large differences in execution time are to be expected.

Evaluated options

Here is a list of the most important options that were evaluated, together with a short description of each:

-Xmx -Xms : These parameters specify the maximum and minimum heap size of the virtual machine respectively. By setting both the minimum and maximum values a fixed heap size can be specified. It is important to set the heap size large enough so the virtual machine does not run out of memory, but if the heap size is set too large, extra time is lost during the garbage collection.
-XX:+UseParNewGC : This option enables the parallel garbage collector which can run simultaneously with the Concurrent collector.
-XX:+UseConcMarkSweepGC : This option enables the Concurrent collector.
-XX:MaxNewSize -XX:NewSize : These options will define the maximum and minimum size of the young generation. Setting the young size larger than half the total heap becomes inefficient. Setting it too small can turn it into a bottleneck as the young generation collector will have to run frequently.
-XX:+UseTLAB : When enabling this option the virtual machine will enable the thread local allocation of objects. This allows multiple threads to allocate objects concurrently with less locking necessary in the global eden space.

Optimizing for low latency

When optimizing for low latency the obvious choice of the Java garbage collector would be the combination of the Parallel and Concurrent collector as they try to minimize the length of the blocking pauses of the garbage collection process. Setting the total heap size and the young generation could give the best results as no additional time will be lost resizing them. However it proves to be very difficult to estimate the ideal sizes, and it is impossible to find a one size fits all value that will be optimal for all scenarios. Therefore it is advisable to set only the maximum size of the total heap, making sure enough memory is available when needed.

Table 1. Java Virtual Machine tuning options for low latency behavior

  • -Xmx512m
  • -XX:+UseParNewGC
  • -XX:+UseConcMarkSweepGC
  • -XX:+UseTLAB
  • -XX:+CMSIncrementalMode
  • -XX:+CMSIncrementalPacing
  • -XX:CMSIncrementalDutyCycleMin=0
  • -XX:CMSIncrementalDutyCycle=10

The options shown in Table 1 include the fine tuning of the Concurrent collector by allowing it to do its job incrementally. It is best not to set the maximum heap size too large; the cost of increasing its size is not very large when looking at the length of the average garbage collection pauses. Setting a fixed heap size does result in much longer garbage collection pauses as shown in Figure 6.

Figure 6
Figure 6. The low latency optimized garbage collection results in much shorter garbage collection pauses, especially when the heap size is not fixed.

The last four garbage collection options in the above list should be added only on systems with one or two processors. This enables concurrent mark and sweep (CMS) incremental mode that breaks concurrent sections of the collector into bursts of action as opposed to executing them continuously on one thread. This is an important consideration for smaller systems since the concurrent sections of the collector use one CPU entirely. Having one CPU dedicated to potentially lengthy concurrent sections could have a significant performance impact (for example, in response time or throughput). The bursts of activity allow the CPU to be reassigned to the application in between. In addition, these bursts are scheduled in between young collections to further reduce the garbage collection impact.

Minimization of the execution time

When optimizing the execution time it is not as important to eliminate the pauses caused by the garbage collection. However, a parallel execution of the garbage collector is still preferred as this will generally prove to be faster than the default garbage collector. When we want to minimize the execution time it is generally better to minimize the number of garbage collections, which can be done by setting a larger heap size. This results in fewer but longer garbage collections, and the garbage collection pauses will be longer than they would be with the low latency options.

Table 2. Java Virtual Machine tuning options for minimal execution time

  • -Xmx1024m
  • -Xms1024m
  • -XX:+UseParNewGC
  • -XX:+UseTLAB

Table 2 shows the set of short execution time options. Setting a larger fixed heap size does mean a significant performance penalty when using the default garbage collection but results in a performance advantage when using the Parallel garbage collector, as shown in Figure 7.

Figure 7
Figure 7. The low execution time optimized garbage collection results in shorter execution times when the heap size is fixed.

Specific tuning options: SIP signaling

The proposed option sets offer a good base for optimizing the garbage collection behavior to the specific needs of the application. However other options are available that can be considered in specific cases. Some applications, such as the SIP proxy example, have a specific behavior that allows for further optimization. A large number of objects have a very short lifetime (for example, objects that represent SIP messages) while others live for a rather long time (for example, objects representing a SIP session). For this reason it would be optimal to place the longer lived objects immediately in the tenured space, reducing the number of times they will be evaluated for garbage collection and therefore speeding up the garbage collection process. This is called pretenuring (see References for more details). However this requires support from the virtual machine, which is currently unavailable.

We can achieve a similar effect by setting the number of times objects should be copied (before tenuring) between the survivor spaces to zero. Now objects still get created in the young generation, but if they live long enough they will immediately be moved to the tenured generation upon the first garbage collection of the young generation. This can be achieved by setting the -XX:MaxTenuringThreshold=0 option. As a consequence the survivor spaces are no longer needed, therefore the size of the survivor spaces should be reduced as much as possible. This can be done by setting the -XX:SurvivorRatio=N option. It is only possible to set the size in relation to the young generation size, therefore choosing a sufficiently large value for N (for example, 128) will take care of this. To achieve optimal results it is also recommended that you limit the size of the Young generation. This will cause more frequent but shorter garbage collections.

Pages: 1, 2, 3

Next Page ยป