Articles
Enterprise Architecture
by Tom Barnes
04/25/2006
BEA WebLogic Real Time 1.0 (WLRT) is a standards-based server supporting applications that demand fast, predictable response times and low latency. Support for real-time applications is accomplished via WLRT's JRockit JVM, which is equipped with Deterministic Garbage Collection (DetGC), a dynamic garbage collection algorithm that enables extremely short pause times and limits the total number of those pauses within a prescribed window.
This article analyzes the effectiveness of the JRockit DetGC feature by comparing it with the Sun JVM's Incremental Garbage Collection (IncGC) and with the default JRockit without DetGC. To illustrate the effective memory management provided by DetGC, this article contains graphic measurements of the latency and throughput of the sample "Trader" application, as well as a summary of the software tuning that was required to improve its performance.
There's a new class of enterprise application where latency is paramount; data becomes stale and useless if it isn't executed on within milliseconds. BEA WebLogic Real Time (WLRT) is targeted for applications that have low latency, predictable, performance-driven requirements. WLRT is a resilient, lightweight middleware solution for real-time computing. WLRT applications have the flexibility of using either standalone Java or the Spring framework, or of relying on WebLogic Server. The former two options allow WLRT applications to be lightweight; the latter option combines a low-latency solution with enterprise-class reliability, availability, manageability, and scalability. Unlike non-real-time middleware, WLRT provides stringent millisecond-latency performance and reliable service-level agreements.
WLRT 1.0 contains several components:
WLRT allows customers with real-time needs to reap the benefits of a standard Java-based infrastructure, including higher developer productivity, fewer defects, a proven kernel, and adherence to standards. This directly addresses the difficulties faced by financial services companies, as well as a number of other industries, in writing and maintaining real-time applications.
To demonstrate the effectiveness of WLRT, a sample stock trading application named Trader was tuned and then benchmarked. A running Trader application consists of one or more clients that repeatedly perform quote and order requests. Quote requests retrieve stock quotes from a WebLogic Server quote service (via JMS non-persistent messaging to an MDB), while order requests invoke an order service that transactionally records the request in an emulated database via JMS reliable persistent messaging to a message-driven bean (MDB).
Neither the JRockit JVM nor the Sun JVM was heavily tuned since the goal was to generate results that could be achieved without extensive manual tuning of either JVM. In fact, the tuning mostly conformed to standard best practice with one major exception: The server JVM memory was limited to 256MB to aid the JRockit DetGC algorithm. The tuning required to optimize the Trader application is detailed in Tuning the Trader Sample.
The results of this benchmark show that the WLRT 1.0 JRockit JVM with DetGC performed better than the Sun JVM with IncGC and the default JRockit JVM without DetGC in both throughput and latency. Frequently, performance-critical, real-time applications need to trade throughput performance for better response time. However, WLRT 1.0 was able to greatly reduce latency while maintaining a high level of throughput.
In fact, JRockit with DetGC showed one-fifth the latency of the other benchmarked JVMs, and still managed to achieve a 17% higher throughput. In addition, without DetGC, JRockit reported GC pauses as great as 275 milliseconds, but with DetGC, JRockit reported far lower maximum GC pause times of approximately 30 milliseconds. For more information on these results, see Extended two-hour, two-machine benchmarks and Detailed Benchmark Charts.
During the initial benchmarking, a series of short, single-threaded, single-machine runs were made to compare combinations of running deterministic and non-deterministic GC for clients and servers (on both Sun and JRockit JVMs). The Sun Incremental Garbage Collection (IncGC) option was used to compare with JRockit's Deterministic Garbage Collection option.
The results showed that to achieve low and consistent response times it was necessary to use the DetGC option on the client JVM in addition to the server JVM. This may not have been necessary if instead there were lower-load clients spread across multiple machines, as would be more typical in an actual application, rather than a few high-load clients on a single machine.
In addition, a series of twenty-minute scaling runs were performed on JRockit to determine the maximum number of concurrent client threads that achieved high throughput without adversely affecting response time. In this phase, all reported results exclude the first 60 seconds of data for each run (that is, the ramp-up time).
The following graph illustrates the results from the detailed charts located in Detailed Benchmark Charts.
Figure 1: Graphing throughput versus response time for optimal values (click the image for a full-size screen shot)
For the purposes of this benchmark, the results demonstrated that 8 threads are optimal: an aggregate throughput of 1230 operations per second with 99.98% of responses within 50 milliseconds. The highest throughput was achieved at 32 threads (1460 operations per second), but at the expense of poor response times: Only 95.72% of responses were within 50 milliseconds. To view detailed charts of these runs, see Twenty-minute scaling runs.
Finally, three two-hour, two-machine runs were performed using the optimal number of client threads determined from the previous runs (eight threads).
|
Throughput and Latency for One Process with Eight Client Threads
|
|||
|
Client and Server JVM |
Percentage of Operations Greater Than 50 ms |
Percentage of Operations Greater Than 150 ms |
Operations per Second |
|
JRockit Deterministic GC |
0.0169% |
0.0001% |
1191 |
|
JRockit Non-Deterministic GC |
0.4819% |
0.0670% |
1256 |
|
Sun Incremental GC |
0.0889% |
0.0021% |
1020 |
The results show JRockit with DetGC performing better than the Sun JVM with IncGC in both throughput and latency. JRockit performed 17% faster than the Sun JVM, and the Sun JVM showed a five times higher percentage of operations while maintaining greater than 50 milliseconds latency. In addition, without DetGC, JRockit reported GC pauses as high as 275 milliseconds, but DetGC JRockit reported far lower maximum GC pause times of approximately 30 milliseconds (see the GC pause charts below).
The following two charts demonstrate the difference between WLRT JRockit JVM and the Sun JVM. The first chart summarizes Trader performance for WLRT JRockit over a two-hour period, while the second does the same for the Sun JVM. The top black line in each chart shows throughput according to the right-hand scale (the higher the better). The bottom red and blue lines in each chart show application response times according to the left-hand scale (the lower the better).
Figure 2: Two-Hour Benchmark showing performance with JRockit DetGC (click the image for a full-size screen shot)
Figure 3: Two-Hour Benchmark showing performance with Sun IncGC (click the image for a full-size screen shot)
Detailed charts of these runs are provided in Detailed Benchmark Charts.