Performance Analysis of the WebLogic Real Time 1.0 "Trader" Application

by Tom Barnes
04/25/2006

Abstract

BEA WebLogic Real Time 1.0 (WLRT) is a standards-based server supporting applications that demand fast, predictable response times and low latency. Support for real-time applications is accomplished via WLRT's JRockit JVM, which is equipped with Deterministic Garbage Collection (DetGC), a dynamic garbage collection algorithm that enables extremely short pause times and limits the total number of those pauses within a prescribed window.

This article analyzes the effectiveness of the JRockit DetGC feature by comparing it with the Sun JVM's Incremental Garbage Collection (IncGC) and with the default JRockit without DetGC. To illustrate the effective memory management provided by DetGC, this article contains graphic measurements of the latency and throughput of the sample "Trader" application, as well as a summary of the software tuning that was required to improve its performance.

Introduction to BEA WebLogic Real Time

There's a new class of enterprise application where latency is paramount; data becomes stale and useless if it isn't executed on within milliseconds. BEA WebLogic Real Time (WLRT) is targeted for applications that have low latency, predictable, performance-driven requirements. WLRT is a resilient, lightweight middleware solution for real-time computing. WLRT applications have the flexibility of using either standalone Java or the Spring framework, or of relying on WebLogic Server. The former two options allow WLRT applications to be lightweight; the latter option combines a low-latency solution with enterprise-class reliability, availability, manageability, and scalability. Unlike non-real-time middleware, WLRT provides stringent millisecond-latency performance and reliable service-level agreements.

WLRT 1.0 contains several components:

  • BEA JRockit 5.0 R26 - a high-performance Java Virtual Machine (JVM) equipped with Deterministic Garbage Collection (DetGC). While all Java virtual machines have garbage collectors, Deterministic Garbage Collection avoids the unpredictable pauses present in other virtual machines, enabling minimal transaction latency.
  • BEA WebLogic Server 9.1 - a scalable, enterprise-ready Java Enterprise Edition (J2EE) application server. The WebLogic Server infrastructure supports the deployment of many types of distributed applications and is an ideal foundation for building applications based on Service Oriented Architectures.
  • Spring Container - a layered Java/J2EE application framework providing the most complete lightweight container available. Spring contributes to increased developer productivity by allowing developers to use Plain Old Java Objects (POJOs) and by enforcing modular, reusable coding practices.

WLRT allows customers with real-time needs to reap the benefits of a standard Java-based infrastructure, including higher developer productivity, fewer defects, a proven kernel, and adherence to standards. This directly addresses the difficulties faced by financial services companies, as well as a number of other industries, in writing and maintaining real-time applications.

Benchmark Results Summary

To demonstrate the effectiveness of WLRT, a sample stock trading application named Trader was tuned and then benchmarked. A running Trader application consists of one or more clients that repeatedly perform quote and order requests. Quote requests retrieve stock quotes from a WebLogic Server quote service (via JMS non-persistent messaging to an MDB), while order requests invoke an order service that transactionally records the request in an emulated database via JMS reliable persistent messaging to a message-driven bean (MDB).

Neither the JRockit JVM nor the Sun JVM was heavily tuned since the goal was to generate results that could be achieved without extensive manual tuning of either JVM. In fact, the tuning mostly conformed to standard best practice with one major exception: The server JVM memory was limited to 256MB to aid the JRockit DetGC algorithm. The tuning required to optimize the Trader application is detailed in Tuning the Trader Sample.

Executive summary

The results of this benchmark show that the WLRT 1.0 JRockit JVM with DetGC performed better than the Sun JVM with IncGC and the default JRockit JVM without DetGC in both throughput and latency. Frequently, performance-critical, real-time applications need to trade throughput performance for better response time. However, WLRT 1.0 was able to greatly reduce latency while maintaining a high level of throughput.

In fact, JRockit with DetGC showed one-fifth the latency of the other benchmarked JVMs, and still managed to achieve a 17% higher throughput. In addition, without DetGC, JRockit reported GC pauses as great as 275 milliseconds, but with DetGC, JRockit reported far lower maximum GC pause times of approximately 30 milliseconds. For more information on these results, see Extended two-hour, two-machine benchmarks and Detailed Benchmark Charts.

Initial short scaling benchmarks

During the initial benchmarking, a series of short, single-threaded, single-machine runs were made to compare combinations of running deterministic and non-deterministic GC for clients and servers (on both Sun and JRockit JVMs). The Sun Incremental Garbage Collection (IncGC) option was used to compare with JRockit's Deterministic Garbage Collection option.

The results showed that to achieve low and consistent response times it was necessary to use the DetGC option on the client JVM in addition to the server JVM. This may not have been necessary if instead there were lower-load clients spread across multiple machines, as would be more typical in an actual application, rather than a few high-load clients on a single machine.

Intermediate twenty-minute, two-machine benchmarks

In addition, a series of twenty-minute scaling runs were performed on JRockit to determine the maximum number of concurrent client threads that achieved high throughput without adversely affecting response time. In this phase, all reported results exclude the first 60 seconds of data for each run (that is, the ramp-up time).

The following graph illustrates the results from the detailed charts located in Detailed Benchmark Charts.

Figure 1
Figure 1: Graphing throughput versus response time for optimal values (click the image for a full-size screen shot)

For the purposes of this benchmark, the results demonstrated that 8 threads are optimal: an aggregate throughput of 1230 operations per second with 99.98% of responses within 50 milliseconds. The highest throughput was achieved at 32 threads (1460 operations per second), but at the expense of poor response times: Only 95.72% of responses were within 50 milliseconds. To view detailed charts of these runs, see Twenty-minute scaling runs.

Extended two-hour, two-machine benchmarks

Finally, three two-hour, two-machine runs were performed using the optimal number of client threads determined from the previous runs (eight threads).

Throughput and Latency for One Process with Eight Client Threads
(Over an Approximately Two-Hour Run)

Client and Server JVM

Percentage of Operations Greater Than 50 ms

Percentage of Operations Greater Than 150 ms

Operations per Second

JRockit Deterministic GC

0.0169%

0.0001%

1191

JRockit Non-Deterministic GC

0.4819%

0.0670%

1256

Sun Incremental GC

0.0889%

0.0021%

1020

The results show JRockit with DetGC performing better than the Sun JVM with IncGC in both throughput and latency. JRockit performed 17% faster than the Sun JVM, and the Sun JVM showed a five times higher percentage of operations while maintaining greater than 50 milliseconds latency. In addition, without DetGC, JRockit reported GC pauses as high as 275 milliseconds, but DetGC JRockit reported far lower maximum GC pause times of approximately 30 milliseconds (see the GC pause charts below).

The following two charts demonstrate the difference between WLRT JRockit JVM and the Sun JVM. The first chart summarizes Trader performance for WLRT JRockit over a two-hour period, while the second does the same for the Sun JVM. The top black line in each chart shows throughput according to the right-hand scale (the higher the better). The bottom red and blue lines in each chart show application response times according to the left-hand scale (the lower the better).

Figure 2
Figure 2: Two-Hour Benchmark showing performance with JRockit DetGC (click the image for a full-size screen shot)

Figure 3
Figure 3: Two-Hour Benchmark showing performance with Sun IncGC (click the image for a full-size screen shot)

Detailed charts of these runs are provided in Detailed Benchmark Charts.

Pages: 1, 2, 3

Next Page ยป