Articles
Enterprise Architecture
Performance Analysis of the WebLogic Real Time 1.0 "Trader" Application
Pages:
1,
2,
3
The Trader sample consists of a multithreaded Jython client that performs request/response-style JMS messaging with server-side J2EE MDBs. The Java Spring Framework is leveraged when applicable. The server-side code implements a stock price cache, a stock price update service, a stock quote service, and a stock order service. The benchmark framework included with the sample is the Grinder Java Load Testing Framework. Also included are programs that parse and chart output from benchmark output and the JVM verbose GC logging.
|
Component |
What It Does |
|
FixML formatted messages |
Requests and responses are XMLBeans-encoded JMS text messages with a FixML 4.4 schema. |
|
Price cache |
A server-side, in-memory cache of continuously updated stock price data. |
|
Price update service |
Price cache updates are presumed to come from an external data feed. The feed is emulated via an MDB that updates the price cache. The benchmark does not emulate this feed, and instead uses a pre-generated price cache . |
|
Quote service |
A non-transactional MDB that receives non-persistent quote requests, consults the in-memory price-cache, and sends a non-persistent response back to a reply destination designated by the requester. |
|
Order service: |
A transactional MDB where each transaction consists of: receiving a persistent order request message, inserting a record of the request into the database (via a Spring DAO), and sending a non-persistent response back to a reply destination designated by the requester. The benchmark emulates the database using an emulated resource manager. |
|
Client quote request/response |
Jython benchmark clients perform synchronous quote requests. The client sends a non-persistent JMS quote request message to the quote service, and designates a temporary queue for receiving the response. After the request message is sent, the client blocks until the non-persistent response is received via an asynchronous consumer. The asynchronous consumer doesn't call acknowledge (no-ACK mode). |
|
Client order request/response |
Jython benchmark clients also perform synchronous order requests. The client sends a persistent order request JMS message to the order server, and designates a temporary queue for receiving the response. After the request message is sent, the client blocks until the non-persistent response is received via an asynchronous consumer. The asynchronous consumer doesn't call acknowledge (no-ACK mode). |
|
Benchmark request |
A benchmark request consists of first performing four sequential client quote request/responses, and then performing a single client order request/response. The benchmark framework performs benchmark requests in one or more threads, and optionally uses multiple client JVMs. The main intent of the WLRT trader benchmark is to measure the latency of each individual quote and order request. |
The following tuning guidelines were applied to the Trader sample to better adhere to programming best practices, increase throughput, reduce latency, and simplify it for purposes of benchmarking. Together, they improved throughput by more than 10 times, reduced latency by more than 90%, and allowed the benchmark to run for hours at steady rates.
The client application and server MDB code were enhanced to cache heavyweight JMS objects for reuse (JMS connections, sessions, producers, consumers, and temporary destinations), instead of opening and closing them once per message. This is standard best practice and was necessary to achieve consistently low response times.
To achieve low and consistent response times it was necessary to use the DetGC option on the client JVM in addition to the server JVM. This may not have been necessary if instead there lower-load clients were spread across multiple machines, as would be more typical in an actual application, rather than a few high-load clients on a single machine.
The custom JMS connection factories were tuned to limit the size of the message pipeline for in-flight messages between server and consumers to 1 (via the MessagesMaximum attribute). Both MDBs and clients were changed to refer to these custom factories. Depending on the application, this can improve response times, as it favors latency over throughput.
To ensure that MDBs and clients had sufficient threads for asynchronous message processing, the client thread pool was increased to 32 threads, the quote bean was given 32 dedicated server threads, and the order bean was also given 32 dedicated server threads. To utilize the dedicated threads, both the order and quote bean's MDB descriptor max-beans-in-free-pool setting was set to 32.
If the Java heap size is set too high, the deterministic GC engine may defer GC work for longer period, potentially creating longer GC pauses. This issue has been resolved in a later JRockit version. One workaround would be to configure a lower GC Trigger on JRockit, but, for the purposes of Trader, we chose instead to limit the amount of memory configured for the server JVM to 256MB.
For benchmark purposes, calls to the database were replaced with calls that enlist an emulated XA resource manager. This emulates the latency of logging orders to the database and the overhead of two-phase transactions. It also works around the potential problems of either the database table growing too large or non-repeatability of database performance between runs.
In order to locate resources that needed to be cached, as well as memory leaks, the following three tools were quite helpful:
Since these benchmarks are intended to measure maximum latency, it was more important than usual to ensure that no non-benchmark-related processes interfered. Such process work may not noticeably affect throughput but may easily interfere with response time measurements. Therefore, it was ensured that no users were actively logged in, virus checkers were disabled, and so on.
Let's now look at some details of the benchmark environment.
Single-machine benchmarks ran server and clients all on a Dell 6650. Two-machine benchmarks ran the server on an HP DL380 and the clients on a Dell 6650. The HP DL380 has much faster disk hardware, which speeds up transactional and persistent messaging operations.
|
Product/Library |
Version |
|
BEA WebLogic Server |
9.1 |
|
BEA JRockit JVM |
5.0 R26 |
|
Sun JVM |
1.5.0_04-b05 |
|
Spring |
1.2.6 |
|
XMLBeans |
1.0.3 |
|
Grinder |
3.0-beta27 |
|
JRockit JVM |
|
|
Client Options |
-Xms128m -Xmx128m -Dweblogic.ThreadPoolSize=32 |
|
Server Options |
-Xms256m -Xmx256m -Xverbose:opt,memory,memdbg,gcpause,compaction,license |
|
Deterministic GC Option |
-Xgcprio:deterministic |
|
Sun JVM |
|
|
Client Options |
-Xms128m -Xmx128m -Dweblogic.ThreadPoolSize=32 |
|
Server Options |
-server -Xms256m -Xmx256m -XX:+UseSpinning -verbose:gc |
|
Low GC Pause Option |
-Xincgc |