Writing Performant EJB Beans in the Java EE 5 Platform (EJB 3.0) Using Annotations

   
By Scott Oaks, Eileen Loh, and Rahul Biswas of the Java Enterprise Performance Team, September 2006  

Articles Index

The Enterprise Java Beans (EJB) 3.0 specification vastly improves the simplicity of programming enterprise beans. This promises to increase your productivity as a developer. But what about the productivity of your production system? Will it be fast enough to meet the demands of your organization, or will you spend all your newly found free time refactoring code for performance?

This article shows you how to get the best performance out of the new EJB 3.0 programming model. The Java Enterprise Performance team looked at performance in three broad areas: developer performance, performance of 3.0 session and message-driven beans, and performance of Java technology persistent entities.

Table of Contents  
Developer Performance

First, this article will look at how the new programming model affects the performance of development activities. The key area of concern is deployment performance: How do the new annotations affect an application's deployment time? And what artifacts are generated that will affect the application's general runtime performance?

 

Effects on Deployment Time


The first issue is how annotations affect deployment time. As you know, annotations are used to specify information about an Enterprise Java Bean or persistent object, for example:

  • Its type, such as @Session or @Entity
  • Its features, such as @TransactionSupport
  • Its entity mappings, such as @Column

In previous versions of the EJB specification, XML deployment descriptors defined information about an EJB bean and its features in this way:

<session>
  <ejb-name>MySession</ejb-name>
  <home>com.sun.perf.MySessionHome</home>
  <ejb-class>com.sun.perf.MySessionImpl</ejb-class>
  <session-type>Stateless</session-type>
</session>



In the EJB 3.0 architecture and the Java Persistence API (JPA), this information now comes from annotations:

package com.sun.perf;
@Stateless
public class MySessionImpl implements MySession {...}



Although the new code is simpler, it contains a great deal of information that must be inferred. In fact, the deployer must look through every class in the application and inspect it for annotations, aside from those packaged in libraries. Any arbitrary class may carry a type annotation: @Stateless, @Stateful, @MessageDriven, or @Entity. And most classes may define a variable that is injected through annotation: @EJB, @PersistenceContext, and so on. That may sound like a lot of work, but parsing XML is also a lot of work.

So which is faster: parsing the XML descriptors or inspecting the code for annotations? In tests, Sun's Java Enterprise Performance team has found that the two operations take about the same amount of time. This is clearly a case in which your mileage may vary: If you tend to have more classes and less XML, you may find that deployment time is slightly slower. If you have fewer classes and more XML, you may find that deployment time is slightly faster. And improvements in maturing technology will have a big impact. Annotations are relatively new, and application server vendors -- and other developers -- are only beginning to discover the optimal way to process them. XML parsing is, by this point, fairly well studied. In the future, you might expect deployment through annotation processing to improve more than deployment through XML parsing.

 

Effects on Runtime


Some annotations are the equivalent of syntactic sugar: They cause additional code to be executed at the application's runtime. Consider a servlet that defines this instance variable:

@EJB MySession session;



In the EJB 2.1 architecture, the servlet would not have used the annotation, of course. The developer would also have included some code similar to this:

InitialContext ic = new InitialContext();
MySessionHome home =
    (MySessionHome) ic.lookup("java:comp/env/SomeHomeInterface");
session = home.create();



In typical usage, the session home was obtained in the servlet's init() method. The developer might have made the call to create() at the same time or deferred it to the servlet's doGet() or doPost() method, often unnecessarily creating session objects.

Because nothing in life is free, when you use the @EJB annotation, the application server must execute this code when an object of the servlet class is created:

MySession x = (MySession)
      new InitialContext().lookup("java:comp/env/Class_ID");
Field f = MyServlet.class.getDeclaredField("session");
f.set(servletInstance, x);



In either case, the Enterprise Java Bean's Java Naming and Directory Interface (JNDI) lookup -- either the home in EJB 2.1 or the object itself in EJB 3.0 -- will dominate the time to perform this operation. In the EJB 3.0 architecture, there is a very slight overhead from using reflection to set the variable inside the servlet class. However, because this happens only once, immediately after the servlet instance is created, its effect on performance is minimal.

In fact, the EJB 3.0 architecture has a positive impact on performance. That effect occurs because of the simplified EJB 3.0 programming model, which no longer requires you to have separate home and object interfaces. Hence, to locate a session's instance, you no longer need to go through a home object. This affects performance in two ways:

  • The EJB 3.0 architecture is intrinsically faster because fewer objects and classes are involved.
  • It prevents a misuse of stateless session objects common in EJB 2.1 architecture.

The following design pattern is common in the EJB 2.1 architecture:

public void doGet(HttpServletRequest req, HttpServletResponse resp) {
    Session mySession = sessionHome.create();    
    // mySession is initialized in the servlet init().
    ... execute some operations on mySession
    mySession.remove();
}



There is no reason to continually create and remove sessions in this way. The alternative is simply to reuse a global session object. Reusing the global session object occurs automatically when you use injection in the EJB 3.0 architecture, of course. But still, a lot of EJB 2.1 code follows this pattern. EJB 3.0 code will not make that mistake.

How much does this actually impact performance? It will have the greatest impact when the servlet code -- or whatever code uses the injected variable -- contains as little business logic as possible. Otherwise, the amount of time spent in the business logic will dominate any performance measurement of these few statements. In the best case, Sun's Java Enterprise Performance team has found an improvement of almost 3 percent using injected variables in EJB 3.0 as opposed to continually creating and destroying sessions in EJB 2.1, and an improvement of 1 percent using injected variables in EJB 3.0 rather than reusing session objects in EJB 2.1.

It's a small but measurable difference in performance. But what will make a bigger difference in your applications' performance is using annotations appropriately for your application. The next few sections will delve into that topic in more detail.

Session Bean Performance

Despite the face-lift that session beans got with EJB 3.0 architecture, session beans essentially behave the same way they did in EJB 2.1 architecture. What was good for performance for 2.1 beans will most likely benefit 3.0 beans as well. New to session and message-driven beans are features such as business method interceptors and transaction attributes, which also have implications for performance. Because message-driven beans are very similar to stateless session beans, most but not all of these topics will also apply to message-driven beans.

 

Remote Compared With Local Interfaces


Most of you probably already know that there's a significant performance difference between using remote as opposed to local interfaces. Remote interfaces allow clients that are outside the application server to access session beans. That functionality, however, comes with a steep performance penalty. Parameters to remote interfaces are pass-by-value, which involves argument copying and serialization. Because of the additional costs of network latency, client and server stack overhead, and so on, remote calls can be very expensive.

In contrast, local parameters are pass-by-reference and therefore can only be accessed from within the same JVM. * Consequently, method invocation through a local interface is much faster. The Java Enterprise Performance team measured the performance of local and remote interfaces using a simple test case and with both the client and application server on the same machine. Even without the added overhead of network latency, this test showed that use of remote interfaces results in a performance penalty of approximately 30 percent relative to local interfaces, as seen in Figure 1.

Note: Several of the figures abbreviate the word transactions as tx or txn. In these figures, a transaction is defined from the client's perspective, as a completed client request. Therefore, transactions per second as indicated in these figures is a client-side throughput measurement.

Figure 1: Throughput Comparison of Local as Opposed to Remote Interfaces
 

For better performance, applications should use local interfaces whenever possible. In situations in which remote access to sessions beans is required, remote invocation of session beans should be as coarse grained as possible.

No discussion about local and remote interfaces would be complete without a mention of the session facade design pattern. In EJB 2.1 architecture, a common recommendation would have been to use the session facade design pattern as a way of reducing remote invocations to session or entity beans. Rather than invoking multiple remote EJB beans, the remote client would use a session bean to avoid fine-grained remote access of enterprise beans. The session bean would retrieve data locally from other beans and then return the data to the remote client by way of a data access object (DAO) or some other vehicle.

Because EJB 3.0 architecture no longer uses Container Managed Persistence (CMP) entity beans, fine-grained remote access of entity beans is no longer an issue. Entities are now Plain Old Java Objects (POJOs), so a remote client can retrieve the entity object from the application server either through direct lookup or through a session bean, make changes to the entity object, and then return the entity to the application server to be persisted. In this scenario, there is no need to allocate and copy data to a separate DAO because the entity object itself acts as the DAO.

This does not mean that the session facade design pattern is now obsolete. For design and performance, there are still several good reasons to use the session facade design pattern. In particular, if the client needs to access multiple entity objects or other session beans, session facade is still a good way to consolidate data from several resources and reduce the amount of traffic to and from a remote client.

 

Resource Lookups


Another common operation that involves high overhead is the looking up of resources. Whether you are accessing another session bean, entity object, or connection factory, any time you have an external dependency, the application must locate these resources using JNDI. The EJB 3.0 specification has simplified obtaining resources, but looking up resources can still be a big performance hit, especially if you are doing this over and over again.

An easy way to mitigate this overhead is to perform lookups once and to cache static resources to get rid of redundant and unnecessary lookups. With EJB 3.0 architecture, resource injection makes this very easy. When you use resource injection, the 3.0 specification mandates that resources are injected after EJB beans are initialized but before any business methods are called, so this is a simple way of initializing and holding on to your dependencies. External dependencies that cannot be injected through resource injection, such as JDBC or message queue connections, can still be cached while your beans are active. However, once the beans are passivated, these resources must be closed and then reinitialized upon activation.

The Java Enterprise Performance team measured the performance difference between cached and multiple resource lookups using a simple microbenchmark with 200 clients and no pause time between requests. In the first case, dependency injection was used to look up external resources once and then cached. In the second case, dynamic lookups were used to look up resources per invocation. By removing the overhead of multiple lookups, throughput was improved by 5 percent, as Figure 2 shows.

Figure 2: Throughput Comparison of Cached as Opposed to Multiple Resource Lookups

 

Business Method Interceptors


Business method interceptors are a new feature in EJB 3.0 architecture which allow developers to inject functionality before and after business methods. Although they are flexible and convenient, they may also incur unnecessary performance penalties if the developer uses them carelessly. As this article pointed out earlier, interceptors incur a cost at both deployment and runtime. The runtime cost comes primarily from reflection to invoke interceptors. Although a single interceptor will not generate much overhead, because interceptors may be called very often, you should be especially careful that your interceptor method itself does not slow performance.

Interceptor methods may be defined in interceptor classes. A session or message-driven bean can be associated with an unlimited number of interceptor classes, but keep in mind that for every interceptor class defined on a bean, an interceptor instance is created when the bean is created. This results in additional object management overhead and use of your system resources. If the bean is a stateful session bean, the interceptor class instances are passivated when the bean is passivated. Because passivation tends to be an expensive operation, you should minimize the performance penalty of passivation that interceptor classes introduce, and you should make sure that your interceptor classes do not have large objects that need to be passivated and serialized to disk.

Finally, because you can have an unlimited number of interceptors on any given business method, you must remember that overhead quickly starts to add up. By using method-level as opposed to bean-level granularity, you can limit the runtime overhead of @AroundInvoke interceptors to the business methods that need them.

The Java Enterprise Performance team tested three different scenarios to produce the results shown in Figure 3. The first was a session bean with no interceptors, the second was a session bean with a single interceptor, and the last was a session bean with five interceptors. In each case, the amount of work done by the bean and interceptors was the same. In the case with a single interceptor, there was no significant difference in throughput. But in cases with five interceptors, the interceptors' runtime costs became a serious factor, reducing the throughput by 20 percent. The lesson here is to keep careful accounting of your interceptors. Beans that have interceptors can inherit from superclasses that have interceptors. Because interceptor classes can also have superclasses, the number of interceptors on any given method can quickly and invisibly pile up.

Figure 3: Throughput Comparison of Business Method Interceptor Overhead

 

Transaction Management


Session and message-driven beans can be designed with bean-managed or container-managed transactions. If no transaction management type is specified, the default is container managed. With container-managed transactions, the container uses a transaction attribute that the developer specifies to set the transaction context of business methods. This article will discuss transaction attributes in more detail in the next section, but in general, if a new transaction is required, when using container-managed transactions, a new transaction begins when a business method is invoked and ends when the business method is closed. The implication is that if you have an expensive operation that you do not need to include in a transaction, you should separate your transactional and nontransactional logic into different business methods.

Bean transaction management allows developers much more flexibility in determining when a transaction begins or ends. You can start a transaction in the middle of a method. You can start and end more than one transaction in a method. In stateful session beans, you can start a transaction and end it several method calls later. The downside is that in bean-managed transactions, the developer is solely responsible for starting and ending transactions and for ensuring transactional correctness. Particularly in cases in which the transaction context spans multiple business calls, the developer must somehow guarantee that anytime a transaction is started, the transaction will eventually be committed.

Using bean-managed transactions can really help your performance if your methods include expensive operations that do not necessarily need to be part of a transaction. An example might be any sort of I/O that does not need to be part of a transaction, such as writing to an access log. Using bean-managed transactions can also help your performance if you have low contention resources that allow you to extend your transaction context and minimize the number of transactions, as shown in Code Example 1. However, use of bean-managed transactions can hurt performance if you are holding on to a highly contended resource in an extended transaction context because this reduces the amount of concurrency that can occur.

In Code Example 1, a bean-managed stateful session bean begins a transaction during the startCart() method but does not commit it until checkOut() is called. All addItem() method calls in between will be in the transaction context initiated in the startCart method.

Code Example 1

Stateful
@TransactionManagement(BEAN)
public CartSession {
  CartEnt cart;
  @PersistenceContext EntityManager em; 
  @Resource UserTransaction ut;

  @PostConstruct public startCart() {
        
                   ut.begin();
       cart = new CartEnt();
  }

  public addItem (String itemid, int qty) {
      em.persist(new CartItem(itemid, qty, cart.getId());
      cart.setItemQuantity(cart.getItemQuantity() + 1);
  }

  public checkOut() {
      em.merge(cart);
       
                   ut.commit();
  }
}
                



The Java Enterprise Performance Team ran this scenario with 200 clients and an average of 10 items per cart and compared it to the same case using container-managed transactions. By using bean-managed transactions and extending the transaction context, the number of transactions was reduced from 11 transactions to 1, which substantially improved performance.

Figure 4: Throughput Comparison of Bean-Managed as Opposed to Container-Managed Transactions

 

Transaction Attributes


In most cases, you will not need the fine-grained control of bean-managed transactions. By using container-managed transactions, you can focus on business logic instead of transactional correctness. If you set the correct transactional attribute, container-managed transactions can be as performant as bean-managed transactions. There are also situations in which a bean must use container-managed transactions if it needs to inherit a transaction context from a client.

If no transaction attribute is set, the default is REQUIRED. With the REQUIRED transaction attribute, each business method is executed in a transaction. If no transaction context is provided, the container will begin a transaction when the method is invoked. This can be unneeded overhead in cases in which your business method does not need a transaction. In general, you should set the transaction attribute to the least restrictive allowable to achieve data correctness. A good example would be a "browse" scenario, in which a session bean retrieves but does not alter the data from multiple entity objects. Because transaction attributes can be set per method, these "browse" methods benefit from executing without a transaction.

A few transaction attributes allow the code to run without a transaction context, and you should understand how they differ. The SUPPORTS attribute means that if the caller has initiated a transaction, then the business method will continue in the same transaction. The NOT_SUPPORTED attribute means that even if the caller has initiated a transaction, the business method will run outside of the transaction. And the NEVER attribute indicates that if the caller has initiated a transaction, the business method will throw an exception rather than execute. Message-driven beans implement only the NOT_SUPPORTED attribute because message-driven beans are never directly invoked by a caller.

The Java Enterprise Performance team used a microbenchmark to compare the performance of REQUIRED and NEVER transaction attributes when retrieving entity object fields through a session bean. Using the NEVER transaction attribute results in 30 percent greater throughput, as Figure 5 shows.

Figure 5: Throughput Comparison of Using NEVER as Opposed to REQUIRED Transaction Attributes
 

The writing of performant session and message beans has not changed very much from EJB 2.1 to EJB 3.0 architecture. The key principles remain the same:

 

  • Minimize expensive operations.

    • Use remote invocation of session beans only for coarse-grained calls.
    • Cache resources to avoid unnecessary JNDI lookups.
    • Use bean-managed transactions in special cases to minimize the number of transactions.
  • Reduce overhead.

    • Use @AroundInvoke interceptors and interceptor classes with caution and use method-level rather than class-level granularity.
    • Use a transaction only if you need one. Set your transaction attribute appropriately.


Entity Performance

The Java Persistence API (JPA) specification has many features that affect the performance of your Java persistence entities. Some of these features are new in EJB 3.0 architecture, but some existed even in EJB 2.1 architecture. This article will now discuss the following performance features of the JPA specification in more detail. All the following features can be used as annotations:

 

  • Fetch type
  • Cascade
  • Inheritance, inheritance strategy
  • Flush mode
  • Optimistic locking, isolation levels
  • Persistence context: transaction compared with extended


The following subsections discuss these features and compare use cases based on various possible values of the annotations. The end of most subsections present performance data to compare the use cases. The benchmark used for gathering performance data has an Order entity with a one-to-many (1:M) relationship with an OrderLineItem entity. The OrderLineItem entity has a recursive 1:M relationship with itself, which means that there is a hierarchy of OrderLineItem entities under the Order entity. In this particular benchmark, there were three OrderLineItem entities under an Order and OrderLineItem entity, and the OrderLineItem hierarchy was populated up to a depth of three levels, with 39 OrderLineItems under an Order entity. This makes for a medium-sized persistent Order object.

 

Fetch Type


The fetch type is used to specify the data-fetching strategy that a persistence provider uses to fetch data from the database. FetchType is used on the @Basic annotation, @LOB annotation, and relationship annotations such as OneToMany, ManyToMany, ManyToOne, and OneToOne. The default for fetchType is EAGER, except for a ManyToMany and a OneToMany relationship, for which the default is LAZY. A fetchType of EAGER means that a persistence provider will load the attribute of an entity along with the entity, whereas a fetchType of LAZY is a hint to the provider that the attribute need not be fetched along with the entity.

Keep in mind that a FetchType of EAGER is a requirement on the persistence provider, whereas a fetchType of LAZY is only a hint. So even though you may specify the fetchType to be LAZY, the persistence provider may choose to load the attribute eagerly.

A FetchType of lazy benefits Large Objects and Relationships in which the attributes are not accessed immediately when the entity is loaded.

The Java Enterprise Performance team tested two scenarios for the fetchType. In one scenario, the relationship between the Order, the OrderLineItem entity, and the OrderLineItem's recursive relationship with itself was marked with a fetchType of EAGER. In the other scenario, the relationships were marked as LAZY. As you can see in Figure 6, with a FetchType of EAGER, the throughput was 89 percent of the throughput achieved with a fetchType of LAZY. Remember that when you have attributes of the Large Objects type or relationships with deep hierarchy, which are not accessed immediately when an entity is loaded, you should use a fetchType of LAZY. Code Example 2 shows a bad use case, and Code Example 3 shows a good use case.

Code Example 2: Bad Use Case

/**Entity class 
 * Order has a 1:M relationship with OrderLineItem.
 **/
@Entity
public class Order {
@OneToMany(fetchType=FetchType.EAGER, ...) 
public Collection<OrderLineItem> getLineItems(){
return lineItems;
}
}

/**Stateless session bean facade
 **/
@Stateless
public class OrderSessionStateless {
@PersistenceContext private EntityManager em;

/** 
 * Load an order object given an order ID. This case will result 
 * in all line items being loaded along with the Order object. 
 **/
public Order loadOrder(String orderID){
  em.find(orderID); 
}
}



Code Example 3: Good Use Case

/**Entity class
 * Order has a 1:M relationship with OrderLineItem.
 **/
@Entity
public class Order {
@OneToMany(fetchType=FetchType.LAZY, ...) 
public Collection<OrderLineItem> getLineItems(){
return lineItems;
}
}

/**Stateless session bean facade
 **/
@Stateless
public class OrderSessionStateless {
@PersistenceContext private EntityManager em;

/** 
 * Load an order object given an order ID. This case will result 
 * in the Order object being loaded. LineItems maybe loaded lazily 
 * by the persistence provider. 
 **/
public Order loadOrder(String orderID){
  em.find(orderID); 
}
}



Figure 6: Fetch Type-Relationship

 

Cascade Type


The cascade type specifies the set of operations that can be cascaded to a related entity. CascadeType can have different values: PERSIST, MERGE, REMOVE, REFRESH, and ALL.

 

  • PERSIST is similar to an insert command. Specifying a CascadeType of PERSIST implies that a persist operation will be cascaded from a parent entity to its child entity.
  • MERGE is similar to an update command. Specifying a CascadeType of MERGE implies that a merge operation will be cascaded from a parent entity to its child entity.
  • REMOVE is similar to a delete command. Specifying a CascadeType of REMOVE implies that a remove operation will be cascaded from a parent entity to its child entity.
  • REFRESH reloads the related entity from the database when the referring entity is refreshed.


If you do not specify a cascade type, no operations are cascaded.

Your application might have a good use case for PERSIST and REMOVE. For example, you may want to remove from the database a Customer entity that has a one-to-one relationship with an Address entity. In this case, using a CascadeType of REMOVE to remove the Address entity along with the Customer entity makes sense. On the other hand, you must be careful about using the MERGE operation on deep hierarchies. For MERGE to work, all entities between the entity from which MERGE is being cascaded to the entity to which MERGE is being cascaded must be loaded in memory.

You might think that you should avoid using the cascade type of MERGE when you have deep relationship hierarchy because you have concerns about performance, but keep in mind that a business API that does a MERGE operation will typically take the order of time that a database update would take. A business API, such as applyDiscount in the following use case examples, will result in an in-memory merge of the object for which the merge operation has been invoked. Next, when the transaction commits or a flush operation is invoked, the database update will take place. The total time for execution of the entire business API equals the time of the in-memory merge of the entity with the persistence context plus the time to update the database record. When you add up this time, the in-memory merge is much smaller in magnitude than the database-update operation and does not affect the overall timing.

In the following use case, cascading from the Order entity or the OrderLineItem entity provides equal performance. There are two reasons for this. First, the in-memory merge will be an order of magnitude faster than will the database updates, and the in-memory merge will have little or no impact on the combined times. Second, the persistence provider figures out what the exact changes are and rather than executing database updates for all the entities on which the merge operation has been invoked and cascaded, the persistence provider updates the database for only those that have changed -- which means that the number of SQL statements issued for the two uses cases in Code Examples 4 and 5 are the same.

Also, using the CascadeType annotation reduces the amount of code you must write to achieve the same results, as Figure 7 shows. But cascading from the Order entity is still not recommended because for that procedure to work properly, you must make sure that all OrderLineItems between the Order from which the merge is being cascaded to the OrderLineItem that is being updated have been loaded in memory at least once in the session.

Code Example 4: Bad Use Case

/**Entity class 
 * Order has a 1:M relationship with OrderLineItem.
 **/
@Entity
public class Order {
@OneToMany(cascade=CascadeType.ALL, ...) 
public Collection<OrderLineItem> getLineItems(){
return lineItems;
}
}

/**Stateless session bean facade
 **/
@Stateless
public class OrderSessionStateless {
@PersistenceContext private EntityManager em;

/**applyDiscount takes a collection of line items, 
 * applies the discount to them, and merges the changes.
 * In this case, the merge operation is called from 
 * the order object.
 **/
public void applyDiscount(
Collection<OrderLineItem> lis, Order order){
  applyDiscount(lis);
  em.merge(order); 
}
}



Code Example 5: Good Use Case

/**Entity class 
 * Order has a 1:M relationship with OrderLineItem.
 **/
@Entity
public class Order {
@OneToMany(cascade=CascadeType.ALL, ...) 
public Collection<OrderLineItem> getLineItems(){
return lineItems;
}
}

/**Stateless session bean facade
 **/
@Stateless
public class OrderSessionStateless {
@PersistenceContext private EntityManager em;

/**applyDiscount takes a collection of line items, 
 * applies the discount to them, and merges the changes.
 * In this case, the merge operation is called on the 
 * individual line items that have changed.
 **/
public void applyDiscount(
Collection<OrderLineItem> lis){
for(OrderLineItem li : lis) {
  applyDiscount(li); 
  em.merge(li); 
}
}
}



Figure 7: Cascade Type

 

Inheritance


EJB 2.1 architecture did not allow for entity beans to inherit from other entity beans. In the new Java Persistence API specification, an entity can derive state and behavior from another entity or from a nonentity. Inheriting these from a nonentity allows the entity to inherit either behavior or mapping attributes or both.

It's hard to talk about inheritance without also talking about polymorphism. In the new Java Persistence API specification, both the EntityManager find method and Java Persistence Query Language queries support polymorphism. This means that when you load an entity from the dabatase, not only that entity but all instances of its subclasses, if they exist, are also loaded. This also means that, depending on the inheritance strategy you use, your persistence provider may need to do joins over multiple tables to load an entity from the database. This article will soon discuss the performance impact of the inheritance strategy, but it will first discuss the different strategies that the specification allows:

 

  • Single table per class hierarchy. An entity and all its subclasses are stored in the same database table. The instances are distinguished based on a discriminator column.
  • Joined subclass. Attributes of the superclass are usually stored in a single table, whereas all attributes of the subclasses are stored in separate tables.
  • Single table per class. Attributes of each class map to a separate database table. This release of the specification is not required to support this strategy.


As you may have figured out, the joined subclass and the single table per class strategies will require a persistence provider to do an SQL join over multiple tables to load an entity from the database. Does this join and hence the choice of strategy affect performance? Tests show that the answer is no, and the reason is that the joins are done over indexed primary keys.

Code Example 6 shows an Inheritance hierarchy of Order objects. The stateless session bean has a business method that looks up an Order entity using the EntityManager find API and an orderID that matches the LargeOrder entity. Now, given that the EntityManager find API supports polymorphism, not only attributes of an Order entity but also of SmallOrder, MediumOrder, and LargeOrder entities will be loaded.

Code Example 6

@Entity()
@Table(name="J1Order")
@Inheritance(strategy=InheritanceType.SINGLE_TABLE) 
public class Order { }

@Entity()
public class SmallOrder extends Order{ .. }
@Entity()
public class MediumOrder extends SmallOrder{ .. }

@Entity()
public class LargeOrder extends MediumOrder{ .. }

@Stateless
public class OrderSessionStateless {

@PersistenceContext private EntityManager em;

public void queryOrder(String orderID){
...
Order order = em.find(Order.class, orderID);
...
}
}



Whether the test used a single table per class hierarchy or a joined subclass strategy, the performance of the lookup operation was not affected because the persistence provider was doing the joins on indexed primary keys. The same was true even when the test used a Java Persistence Query Language Named Query instead of the EntityManager find API. This test used a large data set in combination with a small cache size to make sure that queries were being executed on the database rather than results being picked up only from the cache. Differences in the performance of these two strategies are also reflected in Figure 8. The important thing to keep in mind is that you should not try to fit your application design into a particular inheritance strategy, on the assumption that one strategy will give you better performance than another. To quote renowned computer scientist Donald Knuth: "Premature optimization is the root of all evil." Also keep in mind that there is no good substitute for a performance test.

Figure 8: Inheritance Strategy

 

Flush Mode


In EJB 2.1 architecture, the state of entities was synchronized to the database before a query was executed. The idea behind this was that any change to the state of the entity that affected the query had to be visible to that query. But in the new Java Persistence API specification, you can control when that synchronization to the database happens through the FlushMode settings. FlushMode can have the following values: AUTO, which is essentially the EJB 2.1 mode, and COMMIT, which means that the state of entities is synchronized to the database only when a transaction commits.

Code Examples 7 and 8 illustrate when to use a flushMode of COMMIT and when to use a flushMode of AUTO. Code Example 7 has a named query called findLineItemsbyOrderID. This query returns the collection of line items that belong to an Order. The example also contains a stateless session bean in which a business method called addLineItem takes a lineItem object as a parameter, adds it to the collection of existing OrderLineItems, and returns the total number of line items in the Order object. As you can see, the first step is to load the order; the second step is to add it to the collection in the appropriate place in hierarchy; and the third step is to run the query, which returns the collection of line items. In this query, you want the last line item you just added to the Order object to be reflected in the results, so it makes sense to set the flushMode to AUTO.

Code Example 7

/**Named Query 
 * defined for OrderLineItem Entity, loads all lineitems that 
 * belong to an Order with a given id.
 **/
@NamedQuery(name="findLineItemsByOrderID", 
query="SELECT OBJECT(li) from OrderLineItem li where li.order.id=:id")
@Entity public class OrderLineItem{ }

@Stateless
/** addLineItem adds a lineitem object to an order with id orderID 
 *  and returns the number of lineitems in that order object.
 **/
public int addLineItem(String orderID,
OrderLineItem li){
Order order = em.find(Order.class, orderID);

// Business method adds the lineitem in the hierarchy.
addLineItem(order, li); 
Query q =
em.createNamedQuery("findLineItemsByOrderID");
q.setFlushMode(FlushModeType.AUTO);
q.setParameter("Id", orderID);
List list = q.getResultList();
return list.size(); 
}



Code Example 8 illustrates the use of flushMode with COMMIT. This example loads an Order and assigns a Shipper and a Customer to it. The first step is to load the Order and Customer entities. The second step is to assign the customer reference in the order object and to assign the order object reference in the customer object. The third step is to load the Shipper entities. In this particular case, the query that loads the Shipper entity by its name is not affected by the changes made to the Order and Customer objects, so setting the query's flushMode to COMMIT is acceptable.

Code Example 8

@Stateless
/**assignCustomerAndCarrier
 * assigns customer and shipper to an order object
 */
public void assignCustomerAndCarrier(String customerID,
String orderID, String carrierID){
Order order = em.find(Order.class, orderID);
Customer customer = em.find(Customer.class,customerID);
order.setCustomer(customer);
customer.addOrder(order);

// This query is not affected by the changes above. 
Query q = em.createNamedQuery("findCarrier");
q.setFlushMode(FlushModeType.COMMIT);
q.setParameter("Id", carrierID);
Carrier carrier = (Carrier)q.getSingleResult();
order.setCarrier(carrier);
}



To test the effect of using flushMode on performance, the Java Enterprise Performance team used Code Examples 7 and 8 with the two flushMode values of COMMIT and AUTO. The performance obtained with a flushMode of AUTO was 90 percent of the performance obtained with a flushMode of COMMIT, as Figure 9 shows.

Figure 9: Flush Mode

 

Optimistic Locking


To ensure data integrity, you can choose from one of two major strategies: optimistic locking or pessimistic locking. Optimistic locking assumes that concurrent access to data is not likely and handles the scenario of concurrent access with an optimistic lock exception and a transaction rollback. Pessimistic locking, on the other hand, assumes that concurrent access to data is very likely and prevents such concurrent access.

The new Java Persistence API specification assumes optimistic locking based on version consistency checks. Unfortunately, the specification does not currently discuss pessimistic locking, leaving support of that strategy to vendors.

If your application has a large user load, then contrary to common belief, you may not benefit from optimistic locking. Optimistic locking may result in a large number of transaction rollbacks, which are expensive operations. In this situation, pessimistic locking may give you better performance than optimistic locking.

 

Persistence Context


An easy way to look at persistence context is to see it as a set of entities managed by an entity manager. There are two kinds of persistence contexts: transactional and extended. Entities in a transactional persistence context become detached after the transaction ends, and detached entities are not guaranteed to be synchronized with the database. Entities in an extended persistence context stay managed beyond a transaction. Hence, any change to their state is reflected in the database, though the database synchronization happens only when a transaction commits or when an entity manager flush is invoked.

In the Java EE 5 platform environment, a transactional persistence context is used with a stateless session bean, and an extended persistence context is usually used with a stateful session bean. Typically, you would choose a stateless session bean over a stateful session bean for better performance, but keep in mind that with a stateless session bean, you could end up doing more lookups and merges. See Code Examples 9 and 10.

In this set of examples, the test is simulating a workflow that looks up an order, updates the order description, adds a line item, and sets the customer and shipper information on the order entities. The stateless session bean simulates the same workflow with a lookup and merge in each operation as needed, whereas the stateful session bean does the same by caching the references.

The throughput obtained with a load of 100 users in the case of the stateless session bean was 49 percent of the throughput obtained in the case of the stateful session bean, as Figure 10 shows. Keep in mind that if you were to cache the objects being used in the workflow as detached objects, the performance of the stateless session beans scenario would increase.

Code Example 9: Stateless Session Bean

@Stateless
public class OrderSessionStateless implements OrderSessionStatelessLocal {

@PersistenceContext(unitName=”java_one”)
EntityManager em; 

public void editOrder(String orderID){
// First, look up the order.
Order order = em.find(Order.class, orderID);
....

}

public void assignCustomerAndCarrier(String customerID, String carrierID){
// First, look up the order.
Order order = em.find(Order.class, orderID);
....

}
public void addLineItem(OrderLineItem li){
// First, look up the order.
Order order = em.find(Order.class, orderID);
....
}
}



Code Example 10: Stateful Session Bean

@Stateful
public class OrderSessionStateful implements OrderSessionStatefulLocal {

@PersistenceContext(unitName=”java_one”)
EntityManager em; 

// Order is a member variable.
// Loaded from the db when a lookupOrder is invoked in the workflow
Order order

public void editOrder(){
// Edits the in-memory order's description

}

public void assignCustomerAndCarrier(String customerID, String carrierID){
// Uses the order in memory, no em.find called in this API
}
public void addLineItem(OrderLineItem li){
// Adds a line item to the in-memory Order object 
// No em.find invoked for the order object
}
}



Figure 10: Persistence Context

 

Other Features That Affect Performance


Two other features that affect performance are caching and relationship joins.

Caching

Java Persistence API objects are usually cached, and the cache sizes are usually limited. So it's extremely important for you to know how to manipulate your persistence vendor's cache sizes. To learn more about this, consult your vendor's documentation.

Relationship Joins

Not all persistence providers do an EAGER fetch in a single SQL statement using joins. Some persistence providers choose to load an entity and then check whether the related entity exists in the cache. If it does not, the persistence providers then execute another SQL statement to fetch the related entity. This may affect your application's performance, specifically if you have deep relationship hierarchies.

The good news is that you can write your own Java Persistence Query Language queries to do an eager fetch for you. Code Example 11 illustrates.

Code Example 11

// Owner and Pet have a 1:1 relationship.
Owner o = em.find(Owner.class, "me");
// Two SQL statements: Select * from owner
// and select * from pets.

@NamedQuery(query="SELECT Owner(o) from OWNER o 
LEFT JOIN FETCH o.pet WHERE o.id = :id", name="findOwner")
Query q = em.createNamedQuery("findOwner");
query.setParameter("id", "me");
Owner o = query.getSingleResult(); 
// 1 SQL statement



In this example, the Owner entity has a one-to-one relationship with the Pet entity. When you execute the em.find API using the parameters shown in the code example, you can expect that both the Owner and the Pet would be loaded, because by default the relationship is an EAGER fetch. That does happen, but the persistence provider may choose to do it in two separate SQL statements rather than to use a Join. Of course, you could write your own query to do the eager fetch as illustrated in Code Example 11.

In summary, to get the best performance out of your persistent entities, you must understand your application's data model very well and do the the following:

 

  • Understand how your entities' attributes are accessed and whether the lazy or eager strategy is better suited for them.
  • Understand your persistence provider's eager fetch strategy, including whether it uses a single query or multiple queries.
  • Minimize database access by using the appropriate cache size and flush mode settings.
  • Evaluate whether the strategy of optimistic or pessimistic locking is better suited for your application.


Comparative Performance

This article concludes with a discussion of some microbenchmarks that Sun's Java Enterprise Performance team has used to examine the performance of the Java EE 5 platform compared to the J2EE 1.4 platform. Ease of development is a good reason to move to the newer version, but the team realized that few developers would adopt the Java EE 5 platform if it came with a performance penalty.

The Java Enterprise Performance team's internal microbenchmarks are designed to measure the throughput of a small, fixed number of users who send as many requests as possible to the application server -- that is, the users have no think time. This is not as sophisticated as a typical model that measures the number of users that can be supported with a given think time and response time, models that are common in benchmarks such as SPECweb and SPECjAppServer. A throughput measurement is much simpler and better suited to a microbenchmark.

Second, the Java Enterprise Performance team's microbenchmarks contain virtually no business logic. For example, the performance of session beans is measured as shown in Code Example 12.

Code Example 12

public class MyServlet extends HttpServlet {
    private MySession sess; // Used for EJB 2.1
    private MySessionHome sessHome;
    @EJB MySession sess30; // Used for EJB 3.0

    public void doGet(...) {
        if (usingEJB21)
            if (reuseSession)
                s = sess;
            else s = sessHome.create();
        else s = sess30;
        s.doOperation();
        if (!reuseSession)
            s.remove():
    }
}



The doOperation() method itself has an empty body, so the test is measuring the time difference in entering and exiting the EJB container, including starting the session-based transaction. In the case of an EJB 2.1 application, in which the session is not reused, the time difference will include the time required to construct the new session object.

For persistent objects, the doOperation() method contains simple operations on the entity. The Java Enterprise Performance team conducted five tests:

 

  • lookup: For EJB 2.1, use findByPrimaryKey(); for JPA, use entityManager.find().
  • lookupCMR: This is the same as lookup, except that the object being retrieved will eagerly prefetch a related object. This is a many-to-one relationship, which is configured in Container Managed Persistence (CMP) 2.1 to load the related object and which for JPA will be eagerly prefetched by default.
  • update: This looks up an object and calls a single setter in the object.
  • traverse: This is the opposite of lookupCMR. The one-to-many relationship is not prefetched, so the code will ask for the collection of related objects and iterate through them to make sure they are all loaded.
  • query: This uses findByXXX() for CMP 2.1 and entityManager.findNamedQuery() for JPA.


The article now presents results from two application servers. Note that the authors' intent is not necessarily to provide a competitive study of application server performance. In particular, Java EE 5 is fairly new technology, and the performance of any given server is likely to change as the server code matures. In fact, when Sun's Java Enterprise Performance team ran the tests whose results appear here, both application servers A and B were in beta stage, with one much further along in development than the other, so the actual numbers are doubtless out-of-date. But the data will support the team's contention that there is in fact no performance penalty for using the Java EE 5 platform. On the contrary, its operations tend to be slightly faster than those of the J2EE 1.4 platform.

 

Session Bean Performance


This article mentioned earlier that in the best case for session beans, the Java Enterprise Performance team saw a 3 percent improvement in performance. Figure 11 shows the results.

Figure 11: Comparison of Stateless Session Beans
 

The performance of application server A is just what one would expect given the analysis and examples presented earlier in this article. In an EJB 2.1 application, it's better to reuse the session, but the overall overhead of the EJB 3.0 application is even smaller. Note that even for application server B, which is still under development, the EJB 3.0 case is still faster than the default EJB 2.1 create case.

 

Persistent Object Performance


Basic operations on an EJB 2.1 application and objects written to use JPA are quite varied, as Figure 12 shows.

Figure 12: Comparison of EJB 2.1 Operations to JPA Operations
 

In most cases, the objects using JPA are at least as fast as -- if not faster than -- their CMP 2.1 counterparts. Queries are problematic for both application servers, indicating an area in which both implementations have some work to do.

This comparison is a little problematic in that CMP 2.1 is quite different from JPA, which is another reason that the simple query results are different. In particular, JPA provides for different semantics in a number of areas that this article has discussed. One particularly important area for performance is object caching. In CMP 2.1, each time an object is first used in a transaction, the specification requires that the object be loaded from the database: no caching is allowed. JPA allows and encourages caching of objects from the database.

The results in Figure 12 use the default cache size for the application server in question, which gave a cache hit rate of approximately 2 percent. However, it is possible to increase the cache size, given the database's size and the machine's memory limits. If the cache size is configured large enough to run the JPA benchmark entirely out of memory, so that there is no database access for reads, the results are quite different, as Figure 13 shows.

Figure 13: The Effect of Caching on JPA Access
 

Here, lookupCMR has improved somewhat, though not as much as traverse, which is almost three times faster in the Java EE 5 platform. Again, this points to an area where with additional development work on the application server, one might expect lookupCMR to improve even more.

Conclusion

The results shown in this article are only preliminary and based on beta software that is continuing to mature rapidly. Nonetheless, there are three important conclusions to draw from the data:

  • The Java EE 5 platform will be at least as fast as -- if not faster than -- J2EE 1.4. In the case of persistent entities, new features such as caching will allow dramatically superior performance.
  • You can expect widely different results from different application servers. If you ensure that your code uses only the standard Java EE 5 platform APIs and does not use vendor-specific extensions, you should be able to move to the best application server at any time.
  • Don't stop learning. The Java EE 5 platform has some great new technologies that will allow you to get the most out of your hardware and your development time.


 

For More Information


* The terms "Java Virtual Machine" and "JVM" mean a Virtual Machine for the Java platform.

About the Authors

Rahul Biswas is a member of the Java Enterprise Performance team at Sun. He is currently involved in performance improvement of the persistence implementation in Project GlassFish.

Eileen Loh has been a member of the Java Enterprise Performance team for over five years. She is currently involved in tracking and improving performance for Project GlassFish.

Scott Oaks is the lead performance engineer for Sun's Java Enterprise Performance team and is the author of several books on Java technology in the O'Reilly Java Series.

Rate and Review
Tell us what you think of the content of this page.
Excellent   Good   Fair   Poor  
Comments:
Your email address (no reply is possible without an address):
Sun Privacy Policy

Note: We are not able to respond to all submitted comments.