By Alka Gupta and Michael Doyle
This paper describes the requirements of a Telecommunication (Telco) application server written in the Java programming language, and how those requirements might influence Java virtual machine (JVM 1 ) design. Most of the discussion is not limited to Telco alone, but applies to the enterprise as well. The paper:
Finally, the paper includes a case study of a Session Initiation Protocol (SIP) server that demonstrates significant performance gains from using these new collectors and other tuning techniques described.
The Java programming language is widely accepted as the language of choice for many data-oriented applications. The reasons for the overwhelming success of the language are based around a key set of features that were designed into the language from its inception:
But crucially, the design of the underlying JVM, which is the engine that executes the Java byte code produced during compilation of source code, has been geared towards general data applications. This has led to some anomalies in run-time performance and responsiveness when using the JVM in a time-critical environment. Traditionally, GC suspends the user's application while the system re-cycles memory. Such GC pauses can be of the order of seconds or even minutes in severe cases.
The Telecom Carriers equipment suppliers are keenly aware that carriers are placed under stringent Service Level Agreements with corporate customers, as well as having an expected level of reliability and response time to the residential market. The idea of waiting 10 or 20 seconds to hear the dial-tone when you pick up the phone is clearly unacceptable.
Telecom Carriers are interested in using Java within their environment for a variety of reasons, not least of which is the vast wealth of application creation talent that exists -- literally millions of programmers and companies have standardized on Java as the base technology, which means that new services and skills are attainable in quantities previously unheard of. Add to that the comprehensive and standard environment of Java 2 Platform, Enterprise Edition (J2EE), and the possibilities expand exponentially. However, the specter of "pauses" and the resulting latency in application response would cause any carrier to question the validity of Java as the core technology within their networks.
Carriers depend on completed calls and service sessions in order to generate revenue. Incomplete or abandoned sessions, (for example, calls and Instant Messages), represent wasted network resources, and a cost of provisioning bandwidth that is ultimately unused.
SIP as a protocol is an ideal choice for this environment, as it is an accepted standard within the Carrier market, including the 3rd generation (3G) companies spearheading the next generation communication infrastructure. SIP is a UDP and TCP-based signaling protocol that has packet retransmission (for UDP) built into the protocol definition. If a UDP request packet does not have a response within a certain time frame (about 500 milliseconds (ms) usually), then retransmission occurs, to account for network outage, packet loss and other errors that must be addressed. This means that the JVM cannot suspend an application for any period even approaching 500 ms, because when network packet latency (50-100ms each way), and processing time on the CPU are taken into account, there is very little time left before a retransmission would occur. Cascade failure is often a result of poorly designed application servers that do not accommodate high load conditions, or run on JVMs (such as those prior to J2SE1.4) that use traditional single threaded, stop-the-world garbage collection policies, meaning they pause or suspend all application threads during garbage collection. Also, these JVMs are single-threaded and enable garbage collection only on a single CPU even within a multi-CPU host.
Sun HotSpot JVMs (JDK 1.3+) are generational. The generational algorithm provides for efficient memory recycling and object aging. The JVM heap is split into a "young generation" and an "old generation" according to object "age". The young generation is further split into an "Eden" and two "Survivor spaces". For most applications, two-thirds of allocated objects die very young, are considered "short term objects", and can be collected in the young generation. Typically, the young generation is much smaller in size relative to the total heap size. This leads to frequent but short pauses in the young generation, but more memory is recovered per unit of collection work. However, objects that survive a sufficiently large number of young generation collections are considered "old" or "long term objects" and are "promoted" or "tenured" to the old generation. Even though the old generation is typically larger, it eventually gets filled up and requires collection. This leads to less frequent but larger pauses in the old generation. For further information on Sun HotSpot JVM garbage collection technology, refer to Tuning Garbage Collection with the 1.3.1 Java Virtual Machine.
Ubiquity's SIP application server, Application Services Broker (ASB), is 100% pure Java technology, and provides a perfect testing environment for the JVM and garbage collection. As a SIP Application server, the ASB is responsible for receiving, modifying and ultimately responding to service requests from the signaling network. It achieves this by directing the SIP traffic from the signaling network to service elements (SIP Servlets) that are running within the ASB. These service elements register with the ASB for interest in certain signaling messages. The ASB generates large call volumes with a variety of load distribution characteristics. It is also designed as a cluster-based implementation, scaling on multiple processors, on a single machine or multiple hosts. This is where the JVM design becomes stressed to the limits, and observation of this environment has provided considerable insights for improving the multithreaded nature of the JVMs.
The Ubiquity ASB for example, generates approximately 220 KB of garbage per call processed (in a general sense, a call is a session). Almost 10% data of each call must survive for about 40 seconds, and the other 90% dies almost immediately. This 10% of the data, even though short lived, lives long enough to consume enormous amounts of memory if one considers an average performance of 100 calls per second. In 40 seconds this would account for at least 88 MB of active data. If garbage collection were to happen, for example, once every five minutes in the old generation, for example, the heap would need to be sized to at least 660 MB. That is a huge amount for the traditional garbage collection engine to scan and collect within, say, 100-150 ms.
Telco Application servers require a more deterministic GC model that is restricted in the time it suspends applications, and also that scales well on multiple processors. In J2SE 1.2 and 1.3, the GC is single-threaded and stop-the-world in nature. The pauses resulting from garbage collection in the JVM add latency to the application and adversely affect the application's performance in terms of throughput and scalability. This architecture also precludes any significant scaling within a multi-CPU single box machine. Since the garbage collection is not distributed across the many available processors, with a highly multi-threaded application running on the JVM, the JVM becomes stressed in a multi-CPU environment, in that the application will be constantly waiting for the JVM to return control, even though there are sufficient processors available to perform application-level work. The impact of a single-threaded GC on a multiprocessor system grows relative to an otherwise parallel application. If we assume an ideal application is perfectly scalable with the exception of GC, it may spend only 1% of the time in GC on a uniprocessor. This translates into roughly 24% loss in throughput at 32 processors. If the GC time for a uniprocessor is as much as 10%, this is still not considered an outrageous amount of time in GC. However, when scaling up, this translates to roughly 78% of lost throughput which is extremely significant.
Turbo-charging Java in this context would essentially mean the following:
J2SE1.4 implements the non-blocking new I/O APIs which provide new features and improved performance in the areas of buffer management, scalable network and file I/O.
The 64-bit JVM in J2SE1.4 allows heaps larger than 4G, even up to 300G or more. However, a bigger heap results in bigger, though less frequent sequential GC pauses, if garbage generation and allocation rates are held constant.
An important parameter that determines the application efficiency and scalability is the "GC sequential overhead". It is the percentage of the application run time spent doing garbage collection in the JVM, while the application is paused. It can be calculated as:
Avg. GC pause * Avg. GC frequency * 100 %.
"GC frequency" is the periodicity of GCs or number of GCs per unit time. GC sequential overhead may be calculated separately for the "young" and the "old generation" since the two generations have quite different average GC pause and frequency characteristics, and may be added to calculate the total GC sequential overhead for the application.
GC sequential overhead can also be calculated as
Total GC time/Total wall clock run time.
"Total GC time" can be calculated as
Avg. GC pause * total no. of GCs
Clearly, smaller GC sequential overhead implies higher application throughput and scalability.
J2SE1.4.1 introduces two new garbage collectors designed to help boost application performance and scalability by allowing the JVM to scale to larger number of CPUs and memory. These collectors help meet some of the challenges imposed by Telco requirements such as:
These two new collectors introduced in J2SE1.4.1 are:
Table 1 highlights the features of the different collectors available in J2SE1.4.1.
Table 1. Summary of the features of different garbage collectors
Figures 1, 2 and 3 are graphical illustrations of the different garbage collectors. The green arrows represent a multi-threaded application running on a multi-CPU box. The red arrows represent the GC threads. The length of the GC thread roughly represents the length of the GC pause.
Figure 1. Young generation collectors.
Figure 2. Old generation collectors. Garbage collection with the Parallel Collector and Concurrent mark-sweep collector enabled together. Best Case scenario for high throughput and low latency.
Figure 3. Garbage Collection with the two new collectors enabled together.
As illustrated by the diagrams above, use of the parallel collector in the young generation and the concurrent mark-sweep collector in the old generation together can help reduce pause times and GC sequential overhead. These two collectors thus help the applications to scale to larger number of processors, and to larger memory.
Some of the well known switches which have been available in Sun HotSpot JVMs from J2SE1.3 for the purpose of sizing the heap and tuning garbage collection include:
For more details on these switches, refer to JVM HotSpot VM options.
The switches available from J2SE1.4.1 for enabling the two new garbage collectors are:
nis equal to the number of CPUs on the system. However, it has been observed that tweaking this number may improve performance in some cases. An example might be that of multiple instances of JVM running on a single multi-CPU system. In this case, the number of parallel GC threads for each JVM may need to be set to less than the number of CPUs by using this switch. Available from J2SE1.4.1.
Some of the other switches which can be used for performance tuning are as follows:
-XX:MaxTenuringThreshold=0promotes all objects allocated by the application in the young generation, which survive a single GC cycle, directly to the old generation without copying them around between the survivor spaces in the young generation. This setting, when used with CMS collector in the old generation helps in two ways.
-XX:SurvivorRatioto a very high value, say 128. This is because, in this case, the survivor spaces are not used, and objects are promoted directly from Eden to the old generation during every GC cycle. By setting a high survivor ratio, most of the the young generation heap is allocated to "Eden". Available from J2SE1.3.
zto 90 would mean that 90% of the survivor space must be used before the young generation is considered full and objects are promoted to the old generation. This would allow objects to age more in the young generation before being tenured. The default value is 50. Available from J2SE1.3.
There is a lot of useful GC-related information which the JVM can log into a file. This information can be used to tune and size the application and the JVM from a GC perspective. The switches available for logging this information are:
-XX:NewSize, -XX:MaxNewSize, -XX:SurvivorRatioand
-XX:MaxTenuringThreshold=0as described earlier, should be directed by the analysis of the output from this switch to determine that objects are not prematurely promoted to the old generation. Available from J2SE1.3.
java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails -XX:+PrintTenuringDistribution -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Xmx512m -Xms512m -XX:NewSize=24m -XX:MaxNewSize=24m -XX:SurvivorRatio=2 <app-name>
311.649: [GC 311.65: [ParNew Desired survivor size 4194304 bytes, new threshold 3 (max 31)
- age 1: 1848472 bytes, 1848472 total
- age 2: 1796200 bytes, 3644672 total
- age 3: 1795664 bytes, 5440336 total
: 21647K->5312K(24576K), 0.1333032 secs] 377334K->362736K(516096K), 0.1334940 secs]
The GC snapshot above describes a young generation GC and gives the following information:
- Total JVM heap at this GC was
516096Kand total heap size of the young generation was
- The old generation heap at the time of this GC was
516096K - 24576K = 491520K.
- Describes the "aging distribution" of the objects allocated by the application in the young generation at the end of this GC cycle. This GC cycle set the
new threshold=3meaning that objects will age up to 3 before the next GC cycle takes place.
- The Desired survivor size of 4194304 bytes was determined by
SurvivorRatio=2, which sets the ratio of
Eden/Survivor space=2. With a young generation heap set to 24MB, this would mean
24MB = 2 * Survivor space + Survivor space + Survivor space, or
Survivor space=8MB. With a default value of
TargetSurvivorRatio=50, desired survivor size was set to 8 * .5 = 4MB
- Time stamp of the GC relative to the start of the application run was
- Total GC pause for this collection in the young generation was
- The size of the used young generation heap before this GC was
- The size of the used young generation heap after this GC was
377334K - 362736K = 14598Kof data was garbage collected in the young generation.
21647K - 5312K - 14598K = 1737Kwas promoted into the old generation.
- The total size of the used heap was
377334Kbefore this GC took place.
- The total size of the used heap was
362736Kafter this GC took place.
- The size of used heap in old generation before this GC was
377334K - 21647K = 355687K
- The size of used heap in old generation after this GC was
362736K - 5312K = 357424K
- The total size of objects promoted into the old generation during this GC was
357424K - 355687K = 1737K
- The legend
ParNewindicates that the parallel collector was used in the young generation. The use of default copying collector is indicated by a legend
DefNew, instead of
The convention used in the verbose:gc log format is that if there's something being reported that is specific to a generation, it's preceded by the name of that generation.
[GC [gen1 info1] [gen2 info2] info]
infois general to GC and includes information for the young and old generation combined,
info<n>is specific to
gen<n>. So, attention needs to be paid to the nesting and matching of square brackets.
The snapshots below indicate the use of the concurrent mark-sweep collector in the old generation. As mentioned earlier in the paper, this phase is split into four phases.
513.474: [GC [1 CMS-initial-mark: 335432K(491520K)] 340897K(516096K), 0.0482491secs]
initial markphase which is stop-the-world and took 0.048 seconds.
"[GC"represents the stop-the-world phase.
514.337: [CMS-concurrent-mark: 0.814/0.814 secs]
514.36: [CMS-concurrent-preclean: 0.023/0.023 secs]
This is the concurrent mark phase which took less than a second,
(0.814 + 0.023) seconds, however, the application runs concurrently with the GC during this time. Again, this phase does not collect any garbage.
514.361: [GC 514.361: [dirty card accumulation, 0.0072366 secs]
514.368: [dirty card rescan, 0.0037990 secs]
514.372: [remark from roots, 0.1471209 secs]
514.519: [weak refs processing, 0.0043200 secs] [1 CMS-remark: 335432K(491520K)] 352841K(516096K), 0.1629795 secs]
remarkphase which took about 0.162 secs.
"[GC", as mentioned earlier, represents the stop-the-world phase.
517.692: [CMS-concurrent-sweep: 2.905/3.167 secs]
517.766: [CMS-concurrent-reset: 0.073/0.073 secs]
This is the concurrent sweep phase which took about 3 seconds, however, application threads can run concurrently with the GC thread during this phase on a multi-processor system. Among the four CMS phases, this is the only phase when the heap is swept and collected.
If instead of the CMS collector, the default mark-compact collector is used in the old generation, the old GC snapshot would look as:
719.2: [GC 719.2: [DefNew: 20607K->20607K(24576K), 0.0000341 secs]719.2: [Tenured: 471847K->92010K(491520K), 2.6654172 secs] 492454K->92010K(516096K), 2.6658030 secs]
719.2seconds was the time relative to the start of the application at which this GC took place.
DefNewindicates the use of default copying collector in the young generation.
"[GC"represents a stop-the-world GC triggered by the JVM. For an old generation GC, requested by the application through the system call,
System.gc(), the above snapshot would be prefixed by the legend
[DefNew: 20607K->20607K(24576K), 0.0000341 secs]
It was then decided that a "tenured" generation collection needed to happen, (obviously), because that generation was too full to absorb promotions from the young generation, so a full mark-compact collection was done.
Tenuredindicates a full mark-compact GC in the old generation. The "old generation" is sometimes referred to as "Tenured generation"
471847Kwas the size of used heap in the old generation before GC.
92010Kwas the size of used heap in the old generation after the GC.
491520Kwas the total heap size of the old generation at the time of this GC.
492454Kwas the total used heap, for young and old generations combined, before GC.
92010Kwas the total used heap, for young and old generations combined, after GC.
492454K - 92010K = 399837K
A variety of information regarding the application and JVM behavior with respect to garbage collection can be derived from these logs. This includes:
Average GC pausesin the young and old generation
The average time the application is suspended while garbage collection is done in the JVM.
Average GC frequencyin the young and old generation
The periodicity at which the garbage collector runs in the young and the old generation. This can be obtained since the time instance of each GC activity is logged.
GC sequential overhead
The percentage of system time for which the application is suspended for garbage collection to take place. Calculated as
Avg. GC pause * Avg. GC frequency * 100%
GC concurrent overhead
The percentage of system time for which garbage collection happens concurrently with the application. Calculated as
Avg. concurrent GC time (sweeping phase) * Avg. concurrent GC frequency / no. of CPUs
The total garbage collected during each GC.
The rate at which data gets allocated by the application in the young generation. If young generation heap
occupancy_at_start_of_current_gc=x , occupancy_at_end_of_previous_gc = yand the
1 per second, then the
allocation rateis approximately
x-y per second.
The rate at which data gets promoted to the old generation. Size of objects promoted per GC is calculated as described in the "Young Generation GC" section, and if the young generation
GC frequencyis 1 per second for example, the promotion rate would be 1737K per second for the snapshot shown in that section.
Total dataallocated by the application per call
This can be calculated as
Allocation Rate /Call ratewhere allocation rate is calculated as above and
Call Rateis the load on the application server. In other words, Call rate is the rate at which the server is processing the incoming calls or requests.
short term dataand l
ong term data
Long term datais what survives the young generation GC cycles and is promoted to the old generation. This can be calculated as
Promotion Rate/Call Rate
Short Term dataper call
The short lived data that dies very quickly and is collected in the young generation. This can be calculated as
Total Data - Long Term Data.
This is critical for building a model for sizing the JVM heap. For example, for a load of 100 calls per second, with long term data of 50 K per call lasting for a minimum of 40 seconds, as in the case of SIP applications, the minimum memory footprint of the old generation would have to be
50K*40s*100 = 200M
These can be detected and "out of memory" errors can be better understood by monitoring the garbage collected at each GC as shown by these logs.
This kind of information can be used to model and better understand the application and JVM behavior from a GC perspective, then to performance tune the process of garbage collection.
PrintGCStats is a shell script which mines the "verbose:gc" logs and summarizes statistics about garbage collection, in particular, the GC pause times (total, averages, maximum and standard deviations) in the young and old generations. It also calculates other important GC parameters like the GC sequential overhead, GC concurrent overhead, data allocation and promotion rates, total GC and application time and so on. In addition to summary statistics, PrintGCStats also provides the timeline analysis of GC over the application run time, by sampling the data at user specified intervals.
java -verbose:gc -XX:+PrintGCTimeStamps -XX:+PrintGCDetails ...
PrintGCStats -v ncpu=<n> [-v interval=<seconds>] [-v verbose=1] <gc_log_file >
Table 2 describes the output statistics from PrintGCStats.
Table 2. Summary of the output statistics from PrintGCStats
These tuning techniques recommended below are based on the author's experience in the field.
alternate thread libraryavailable from Solaris 8 onwards. It can be used by setting the
LD_LIBRARY_PATH=/usr/lib/lwp:/usr/libon Solaris 8 and is the default on Solaris 9. It invokes the " one-to-one threading model" between the Java threads and the kernel threads. 5-10% or more performance improvement in throughput has been observed in many cases with the use of this library.
prstat -Lm -p <jvm process id>to analyze the resource usage of a process on a per light-weight-process (LWP) basis and to identify potential bottlenecks in terms of scalability and performance. It helps to see, for example, if GC is the bottleneck with respect to application scalability. In many cases, the application itself is not well threaded, hence does not scale on larger systems. This command reports system resource usage and activity on a per
LWPbasis within a given process. Run
man prstaton a command line for more details. One of the drawbacks of this command output is that it is difficult to identify which "Java thread" might be associated with the reported
ThreadAnalyseris a shell script to analyze a Java process and produce "thread names" in the
prstatoutput, instead of Solaris
LWPIDs. It helps identifying the system resource usage on a per Java application's "thread name" basis. The most ideal way to use the script is with the
alternate thread libraryas mentioned above. ThreadAnalyser can attach to an already running Java process (as long as it has access to the stderr output from the process, having previously been redirected to a file) or it can start the Java process itself, given a startup script file for the Java process.
Start a Java process and redirect the standard error to a file as in "
java <app_name> >/tmp/java.out 2>&1". Also note the
PID of the Java process. Then, attach the ThreadAnalyser as this:
ThreadAnalyser -f /tmp/java.out -p PID <interval>
ThreadAnalyser -f /tmp/java.out <interval>
if there is no other Java process running on this machine (it finds the PID using ps(1))
To have ThreadAnalyser start the Java process, run as
ThreadAnalyser -s <app_start_script_file> <interval>
This will work when there are no other Java processes running on the system.
app_start_script_file is the name of the startup script file that starts the Java application.
interval is the time interval at which output data from the script is updated. Default is 5.
For more details on other options and usage, read the README section of the
Sample output format:
prstat on the system command line to get details on each of the columns in the above output.
SIGQUITsignal to the JVM process. This can be done from command line as
kill -QUIT <JVM process pid>.
hexand symbolic stack trace for each LWP in the JVM process, run the command
pstack <JVM process id>. The output also gives the mapping information of
ids to Java application thread
ids. If the "alternate thread library" is used by the JVM, the output will show a one-to-one mapping between each of the process's LWP and the corresponding Java application thread bound to it. Run
man pstackon command line for more details.
-Xrunhprofcommand line flag available in the JVM to help identify unnecessary object retention (sometimes imprecisely called "memory leaks").
DatagramSocket.connect(InetAddress address, int port) and DatagramSocket.disconnect()methods are called. As these are implemented as native calls in the JVM from version J2SE1.4 onwards, which provides other major benefits, incorrect usage of this API may account for a degradation in applications, specifically, improper use of this API can be very expensive in terms of an application's "system time".
Described here are some of the results obtained using the two new garbage collection policies as well as the other tuning techniques detailed in this paper. For the purpose of this case study, a SIP Application Server from Ubiquity, Application Services Broker (ASB), was used. ASB represents a typical Telco application server which exercises the JVM extensively from a GC perspective, as described earlier in the paper.
Sun Fire V480 with 4 x 900 MHz UltraSparc III processors and Solaris 8 operating system. However, the tuning techniques described here apply to other supported platforms, including Linux, Windows, and the Solaris (Intel Architecture) operating environment.
The old generation heap was sized based on the peak load for which the ASB server and the JVM were being performance tuned. The young generation heap was sized empirically by varying the young generation from a size of 12MB to 128MB. 24MB was found to be the most optimal size which gave the lowest GC sequential overhead along with acceptable GC pauses in the young generation. A young generation bigger than 24MB increased the young generation GC pauses and did not improve performance. A size smaller than 24MB significantly increased the GC sequential overhead due to an increase in GC frequency in the young generation and a premature promotion of short lived data to the old generation. Figure 4 shows the variation of GC sequential overhead, GC pauses and GC frequency in the young generation with size of the young generation heap.
Figure 4. Variation of GC sequential overhead, GC pauses and GC frequency in the young generation with size of the young generation heap.
For an old generation of 512MB and a young generation of 24MB, results using the various garbage collection policies and other tuning techniques are shown below.
Best results on J2SE1.4.1 using the default garbage collectors in the young and old generations for the chosen size of the JVM heap:
java -Xmx512m -Xms512m -XX:MaxNewSize=24m -XX:NewSize=24m -XX:SurvivorRatio=2 <application>
Average GC pause in old generation: 3 secs.
Average GC pause in young generation: 110 ms.
GC sequential overhead: 18.9%
Best results on J2SE1.4.1 using the CMS collector in the old generation for the same size of the JVM heap as Case 1:
java -Xmx512m -Xms512m -XX:MaxNewSize=24m -XX:NewSize=24m -XX:SurvivorRatio=128 -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 -XX:CMSInitiatingOccupancyFraction=60 <application>
-XX:MaxTenuringThreshold=0 improved performance. Also, the value of 60 was found to be the sweet spot for
-XX:CMSInitiatingOccupancyFraction as a value smaller than this resulted in more frequent CMS garbage collections. A value greater than this reduced the efficiency of the CMS collection.
Average GC pause in the old generation (stop-the-world
init mark and remark phase of concurrent collection): 115 ms
Average GC pause in the young generation: 100 ms
GC sequential overhead: 8.6%
Best results on J2SE1.4.1 using the CMS collector in the old generation and the new parallel collector in the young generation for the same size JVM heap as in Cases 1 and 2:
java -Xmx512m -Xms512m -XX:MaxNewSize=24m -XX:NewSize=24m -XX:SurvivorRatio=128 -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=0 -XX:CMSInitiatingOccupancyFraction=60 <application>
Average GC pause in the old generation (stop-the-world
init mark and remark phase of concurrent collection): 142 ms
Average GC pause in the young generation: 50 ms
GC sequential overhead: 5.9%
Table 3 summarizes the three cases.
Table 3 Summary of the results from the case study.
The use of the CMS collector results in a nearly 2000% reduction in GC pauses in the old generation and 220% reduction in GC sequential overhead. Reduction in GC sequential overhead directly contributes to improvement in application throughput. The use of a parallel collector results in a nearly 100% reduction in GC pauses in the young generation. On a 4 CPU system, one may have expected an acceleration of 4x with the parallel collector. However, due to overhead associated with the parallel collection, the acceleration from parallel collection is not linear, and work is being done in the JVM towards refining this.
The performance tuning techniques and the new garbage collection policies described in the paper are not limited to Telco alone, and apply to Enterprise servers like Web servers, Portal servers and Application servers, as they have very similar requirements.
Work is being done in the Sun JVMs to reduce low level command line flag tuning. Instead of choosing the sizes of the young and old generation, garbage collection policies, and so on, one may be able to specify higher level requirements in terms of maximum pauses or latency, memory footprint for the JVM, CPU utilization and so on, and the JVM would automatically set and adjust various low level parameters and policies for the application to meet the desired higher level settings.
The HotSpot JVM is being enhanced to include light weight, always-on instrumentation that provides high level performance metrics for various aspects of the operations of the JVM. It should be available in a future version of the JVM. These tools and technology are considered experimental at this time.
Java Specification Request (JSR) 174 is a specification for APIs for monitoring and management of the JVM and JSR 163 is a specification for APIs to extract profiling information from a running JVM with support for both time and memory profiling. These APIs will be designed to allow implementations which minimally perturb the profile. The APIs will allow interoperability of profiling and advanced garbage collection technologies, will allow reliable implementation on the widest range of JVMs, and will provide Java applications, system management tools and RAS-related tools with the ability to monitor the health of the JVM as well as manage many run-time controls.
Also, work is being done on the Sun ONE Application Server 7 to meet some of the specific requirements of Telco application servers. Telco Carrier requirements are similar to the enterprise, except that, they need more
9s for service and data availability and perhaps more orchestrated service upgrades. The Sun ONE Application Server Enterprise Edition 7 contains Clustra "Always ON" technology which enables highly available state checkpointing repository as well as session and bean state. Sun ONE Application server 7 can be used to build JAIN and OSA/Parlay framework-based call processing servers. SLEE implementations can be integrated into the Sun ONE Application server to leverage the EJB container. Some down the road possibilities may also include the JSR 117 APIs for continuous availability using EJBs, and providing soft-real-time and quality-of-service capability as well as the integrated SIP stack.
As Java becomes increasingly popular in the Telco space, Sun JVMs are gearing to meet the challenges imposed by Telco requirements. The 64 bit JVM and larger hardware can provide solutions for configuring much larger heap sizes (not limited to 4G) to meet the need for larger servers. On a multi-CPU system, the parallel collector can reduce pauses in the young generation and concurrent collection in the old generation can hide the large GC pauses associated with the conventional old generation mark-compact collector. This in turn can bring significant reduction in GC sequential overhead, leading to much better application performance and scalabilty on multi-CPU systems.
Garbage collection in the JVM can be monitored, analyzed and tuned by mining the verbose:gc logs. The information available in the GC logs can be analyzed for sizing and configuring the JVM with switches and parameters which might be most optimal for an application's performance.
Sun ONE Application Server 7 Enterprise Edition may be suitable for implementing Telco application servers in the near future.
The authors would like to thank the Sun JVMs garbage collection architects and experts John Coomes, Ross Knippel, Jon Masamitsu and Y.S. Ramakrishna for their help and guidance on conducting this study, providing PrintGCStats, and careful review of the paper, and their colleagues for their help.
About the Authors:
Alka Gupta is a Member of the Technical Staff at Sun Microsystems. She's responsible for working with Sun's ISVs and partners to help them adopt the emerging Sun technologies and platforms quickly and efficiently. She has been working in the area of performance tuning on Sun platforms for almost 5 years, and has been in this industry for over 8 years. Alka graduated from the Indian Institute of Technology (IIT), India.
Michael is the CTO and co-founder of Ubiquity Software Corporation. Prior to this, he worked as a consultant to a number of telecommunication companies on international OSI networks. He is an acknowledged expert in the field of network communication protocols and computer telephony integration. He has also given a series of lectures on languages and protocols to many leading UK software companies. Michael is a graduate of University College London.
Application modeling and Performance tuning from Garbage Collection Perspective
The Java HotSpot Virtual Machine
Java HotSpot VM Options
Tuning Garbage Collection with the 1.3.1 Java Virtual Machine
Java and Solaris Threading
Generational Mostly-concurrent Garbage Collector
New Parallel Collector
Ubiquity's Application Services Broker (A Telco SIP server)
The JAIN API
Sun ONE Application Server 7
The Java HotSpot Virtual Machine, v1.4.1
1 As used on this web site, the terms "Java virtual machine" or "JVM" mean a virtual machine for the Java platform.