Performance Analysis of the WebLogic Real Time 1.0 "Trader" Application
Pages: 1, 2, 3

Detailed Description of the Trader Sample

The Trader sample consists of a multithreaded Jython client that performs request/response-style JMS messaging with server-side J2EE MDBs. The Java Spring Framework is leveraged when applicable. The server-side code implements a stock price cache, a stock price update service, a stock quote service, and a stock order service. The benchmark framework included with the sample is the Grinder Java Load Testing Framework. Also included are programs that parse and chart output from benchmark output and the JVM verbose GC logging.

Component

What It Does

FixML formatted messages

Requests and responses are XMLBeans-encoded JMS text messages with a FixML 4.4 schema.

Price cache

A server-side, in-memory cache of continuously updated stock price data.

Price update service

Price cache updates are presumed to come from an external data feed. The feed is emulated via an MDB that updates the price cache.

The benchmark does not emulate this feed, and instead uses a pre-generated price cache .

Quote service

A non-transactional MDB that receives non-persistent quote requests, consults the in-memory price-cache, and sends a non-persistent response back to a reply destination designated by the requester.

Order service:

A transactional MDB where each transaction consists of: receiving a persistent order request message, inserting a record of the request into the database (via a Spring DAO), and sending a non-persistent response back to a reply destination designated by the requester.

The benchmark emulates the database using an emulated resource manager.

Client quote request/response

Jython benchmark clients perform synchronous quote requests. The client sends a non-persistent JMS quote request message to the quote service, and designates a temporary queue for receiving the response. After the request message is sent, the client blocks until the non-persistent response is received via an asynchronous consumer. The asynchronous consumer doesn't call acknowledge (no-ACK mode).

Client order request/response

Jython benchmark clients also perform synchronous order requests. The client sends a persistent order request JMS message to the order server, and designates a temporary queue for receiving the response. After the request message is sent, the client blocks until the non-persistent response is received via an asynchronous consumer. The asynchronous consumer doesn't call acknowledge (no-ACK mode).

Benchmark request

A benchmark request consists of first performing four sequential client quote request/responses, and then performing a single client order request/response. The benchmark framework performs benchmark requests in one or more threads, and optionally uses multiple client JVMs.

The main intent of the WLRT trader benchmark is to measure the latency of each individual quote and order request.

Tuning the Trader Sample

The following tuning guidelines were applied to the Trader sample to better adhere to programming best practices, increase throughput, reduce latency, and simplify it for purposes of benchmarking. Together, they improved throughput by more than 10 times, reduced latency by more than 90%, and allowed the benchmark to run for hours at steady rates.

Cache or pool JMS resources for reuse

The client application and server MDB code were enhanced to cache heavyweight JMS objects for reuse (JMS connections, sessions, producers, consumers, and temporary destinations), instead of opening and closing them once per message. This is standard best practice and was necessary to achieve consistently low response times.

Used Deterministic Garbage Collection on the client

To achieve low and consistent response times it was necessary to use the DetGC option on the client JVM in addition to the server JVM. This may not have been necessary if instead there lower-load clients were spread across multiple machines, as would be more typical in an actual application, rather than a few high-load clients on a single machine.

Tuned the JMS maximum message pipeline to "1"

The custom JMS connection factories were tuned to limit the size of the message pipeline for in-flight messages between server and consumers to 1 (via the MessagesMaximum attribute). Both MDBs and clients were changed to refer to these custom factories. Depending on the application, this can improve response times, as it favors latency over throughput.

Tuned client and server thread pool sizes and MDB concurrency

To ensure that MDBs and clients had sufficient threads for asynchronous message processing, the client thread pool was increased to 32 threads, the quote bean was given 32 dedicated server threads, and the order bean was also given 32 dedicated server threads. To utilize the dedicated threads, both the order and quote bean's MDB descriptor max-beans-in-free-pool setting was set to 32.

Limited server JVM memory

If the Java heap size is set too high, the deterministic GC engine may defer GC work for longer period, potentially creating longer GC pauses. This issue has been resolved in a later JRockit version. One workaround would be to configure a lower GC Trigger on JRockit, but, for the purposes of Trader, we chose instead to limit the amount of memory configured for the server JVM to 256MB.

Replaced database with an emulated resource manager

For benchmark purposes, calls to the database were replaced with calls that enlist an emulated XA resource manager. This emulates the latency of logging orders to the database and the overhead of two-phase transactions. It also works around the potential problems of either the database table growing too large or non-repeatability of database performance between runs.

Examined JRA profiles (JRockit Analyzer Profiles) and WebLogic statistics

In order to locate resources that needed to be cached, as well as memory leaks, the following three tools were quite helpful:

Avoided interfering with running benchmarks

Since these benchmarks are intended to measure maximum latency, it was more important than usual to ensure that no non-benchmark-related processes interfered. Such process work may not noticeably affect throughput but may easily interfere with response time measurements. Therefore, it was ensured that no users were actively logged in, virus checkers were disabled, and so on.

The Benchmark Environment

Let's now look at some details of the benchmark environment.

Hardware and operating system (two machines)

Single-machine benchmarks ran server and clients all on a Dell 6650. Two-machine benchmarks ran the server on an HP DL380 and the clients on a Dell 6650. The HP DL380 has much faster disk hardware, which speeds up transactional and persistent messaging operations.

Software versions

Product/Library

Version

BEA WebLogic Server

9.1

BEA JRockit JVM

5.0 R26

Sun JVM

1.5.0_04-b05

Spring

1.2.6

XMLBeans

1.0.3

Grinder

3.0-beta27

WebLogic Server configuration highlights

JVM configurations

JRockit JVM

Client Options

-Xms128m -Xmx128m

-Dweblogic.ThreadPoolSize=32

Server Options

-Xms256m -Xmx256m

-Xverbose:opt,memory,memdbg,gcpause,compaction,license

Deterministic GC Option

-Xgcprio:deterministic


Sun JVM

Client Options

-Xms128m -Xmx128m

-Dweblogic.ThreadPoolSize=32

Server Options

-server

-Xms256m -Xmx256m

-XX:+UseSpinning

-verbose:gc

Low GC Pause Option

-Xincgc

Pages: 1, 2, 3

Next Page ยป