SPECjms2007: A Novel Benchmark and Performance Analysis Framework for Message-Oriented Middleware
by Samuel Kounev and Kai Sachs
Message-oriented middleware (MOM) is at the core of a vast number of financial services and telco applications, and is gaining increasing traction in other industries, such as manufacturing, transportation, health-care and supply chain management. There is a strong interest in the end user and analyst communities for a standardized benchmark suite for evaluating the performance and scalability of MOM.
In this article we describe SPECjms2007 - the world's first industry-standard benchmark specialized for MOM. SPECjms2007 is based on a novel application in the supply chain management domain that has been specifically designed as a representative workload scenario for evaluating the performance and scalability of MOM products.
In addition to providing standard workload and metrics for MOM performance, the benchmark provides a flexible performance analysis framework that allows users to customize the workload according to their requirements.
The article discusses the business scenario and workload modeled by the benchmark, as well as the benchmark design and architecture. We explain the meaning of the benchmark metrics and discuss how the various features supported by the benchmark can be exploited for in-depth performance analysis of MOM infrastructures.
Message-oriented middleware (MOM) is increasingly adopted as an enabling technology for modern event-driven applications like stock trading, event-based supply chain management, air traffic control and online auctions to name just a few. Moreover, the publish-subscribe paradigm is now used as a building block in major new software architectures and technology domains such as Enterprise Service Bus (ESB), Enterprise Application Integration (EAI), Service-Oriented Architecture (SOA) and Event-Driven Architecture (EDA). Novel messaging applications, however, pose some serious performance and scalability challenges. For example, the next generation of event-driven supply chain management based on RFID technology will be highly reliant on scalable and efficient backend systems to support the processing of acquired real-time data and its integration with enterprise applications and business processes. Large retailers, like Wal-Mart, Metro or Tesco, are expected to have throughput rates of about 60 billion messages per annum. The performance and scalability of the underlying MOM platforms used to process these messages will be of crucial importance for the successful adoption of such applications in the industry.
To guarantee that applications meet their Quality of Service (QoS) requirements, it is essential that the platforms on which they are built are tested using benchmarks to measure and validate their performance and scalability. While several proprietary benchmarks for MOM servers (for example SonicMQ Test Harness, IBM's Performance Harness for JMS) have been developed and used in the industry for performance testing and product comparisons, these benchmarks do not provide a level playing field for performance comparisons. The reason is that most of them use artificial workloads that do not reflect any real-world application scenario. Furthermore, they typically concentrate on stressing individual MOM features in isolation and do not provide a comprehensive and representative workload for evaluating the overall MOM server performance.
To address these concerns, in September 2005 the Standard Performance Evaluation Corporation (SPEC) launched a project with the goal to develop a standard benchmark for evaluating the performance and scalability of MOM products. The new benchmark was called SPECjms2007 and it was developed at SPEC's OSG-Java Subcommittee with the participation of Technische Universität Darmstadt, IBM, Sun, Oracle, BEA, Sybase and Apache. SPECjms2007 exercises messaging products through the JMS (Java Message Service) standard interface which is supported by all major MOM vendors.
Requirements and Goals
The aim of the SPECjms2007 benchmark is to provide a standard workload and metrics for measuring and evaluating the performance and scalability of JMS-based MOM platforms. To achieve this the SPECjms2007 workload must fulfill several important requirements. First of all, it must be based on a representative workload scenario that reflects the way platform services are exercised in real-life systems. The communication style and the types of messages sent and received by the different parties in the benchmark scenario should represent a typical transaction mix. The goal is to allow users to relate the observed behavior to their own applications and environments. Second, the workload should be comprehensive in that it should exercise all platform features typically used in MOM applications including both point-to-point (P2P) and publish/subscribe (pub/sub) messaging. The features and services stressed should be weighted according to their usage in real-life systems.
The following dimensions have to be considered when defining the workload transaction mix:
The third requirement is that the workload should be focused on measuring the performance and scalability of the MOM server's software and hardware components. It should minimize the impact of other components and services that are typically used in the chosen application scenario. For example, if a database would be used to store business data and manage the application state, it could easily become the limiting factor of the benchmark as experience with other benchmarks (e.g., ECperf) shows. Finally, the SPECjms2007 workload must not have any inherent scalability limitations. The user should be able to scale the workload both by increasing the number of destinations (queues and topics) as well as the message traffic pushed through a destination.
Producing and publishing standard results for marketing purposes will be just one usage scenario for SPECjms2007. Many users will be interested in using the benchmark to tune and optimize their platforms or to analyze the performance of certain specific MOM features. Others could use the benchmark for research purposes in academic environments where, for example, one might be interested in evaluating the performance and scalability of novel methods and techniques for building high-performance MOM servers. All these usage scenarios require that the benchmark framework allows the user to precisely configure the workload and transaction mix to be generated. Providing this configurability is a great challenge because it requires that interactions are designed and implemented in such a way that one could run them in different combinations depending on the desired transaction mix.
The workload scenario chosen for SPECjms2007 models the supply chain of a supermarket company. The participants involved are the supermarket company, its stores, its distribution centers and its suppliers. The scenario, depicted in Figure 1, offers an excellent basis for defining interactions that stress different subsets of the functionality offered by MOM servers, e.g. different message types as well as both P2P and pub/sub communication. Moreover, it offers a natural way to scale the workload., e.g. by scaling the number of supermarkets or by scaling the amount of products sold per supermarket. We now take a closer look at the participants involved in the scenario.
Company Headquarters (HQ)
The company's corporate headquarters are responsible for managing the accounting of the company, managing information about the goods and products offered in the supermarket stores, managing selling prices and monitoring the flow of goods and money in the supply chain.
Distribution Centers (DCs)
The distribution centers supply the supermarket stores. Every distribution center is responsible for a set of stores in a given area. The distribution centers in turn are supplied by external suppliers. The distribution centers are involved in the following activities: taking orders from supermarkets, ordering goods from suppliers, delivering goods to supermarkets and providing sales statistics to the HQ (e.g., for data mining).
The supermarkets sell goods to end customers. The scenario focuses on the management of the inventory of supermarkets including their warehouses. Some supermarkets are smaller than others, so that they do not have enough room for all products, others may be specialized for some product groups like certain types of food. We assume that every supermarket is supplied by exactly one of the distribution centers.
The suppliers deliver goods to the distribution centers of the supermarket company. Different suppliers are specialized for different sets of products and they deliver goods on demand, i.e. they must receive an order from the supermarket company to send a shipment.
The following seven interactions between the participants in the supermarket supply chain are modeled in SPECjms2007:
Let's look at these interactions in more detail.
Interaction 1: Order/Shipment Handling between SM and DC
This interaction exercises persistent P2P messaging between the SMs and DCs. The interaction is triggered when goods in the warehouse of a SM are depleted and the SM has to order from its DC to refill stock. The following steps are followed as illustrated in Figure 2:
Interaction 2: Order/Shipment Handling between DC and SP
This interaction exercises persistent P2P and pub/sub (durable) messaging between the DCs and SPs. The interaction is triggered when goods in a DC are depleted and the DC has to order from a SP to refill stock. The following steps are followed as illustrated in Figure 3:
Interaction 3: Price Updates
This interaction exercises persistent, durable pub/sub messaging between the HQ and the SMs. The interaction is triggered when selling prices are changed by the company administration. To communicate this, the company HQ sends messages with pricing information to the SMs.
Interaction 4: SM Inventory Management
This interaction exercises persistent P2P messaging inside the SMs. The interaction is triggered when goods leave the warehouse of a SM (to refill a shelf). Goods are registered by RFID readers and the local warehouse application is notified so that inventory can be updated. Note that since incoming goods are part of another interaction (Interaction 1), they are not considered here.
Interaction 5: Sales Statistics Collection
This interaction exercises non-persistent P2P messaging between the SMs and the HQ. The interaction is triggered when a SM sends sales statistics to the HQ. HQ can use this data as a basis for data mining in order to study customer behavior and provide useful information to marketing. For example, based on such information, special offers or product discounts can be made.
Interaction 6: New Product Announcements
This interaction exercises non-persistent, non-durable pub/sub messaging between the HQ and the SMs. The interaction is triggered when new products are announced by the company administration. To communicate this, the HQ sends messages with product information to the SMs selling the respective product types (e.g., food, computers, mp3-players).
Interaction 7: Credit Card Hot Lists
This interaction exercises non-persistent, non-durable pub/sub messaging between the HQ and the SMs. The interaction is triggered when the HQ sends credit card hot lists to the SMs (complete list once every hour and incremental updates as required). This interaction is used to exercise non-durable, non-persistent pub/sub messaging.