Peak performance tuning of CMP 2.0 Entity beans in WebLogic Server 8.1 and 9.0

by Dmitri Maximovich
11/14/2005

Abstract

The J2EE specification now enjoys wide acceptance as a standard for contemporary enterprise projects. But an important part of the J2EE specification, EJB persistence, has long been criticized for its complex development model and for poor performance of entity beans. It's pretty much accepted as fact that if entity beans (especially container-managed persistence entity beans, or CMPs) are going to be used in an application, performance will suffer. This is not true.

I won't address complexity concerns related to EJBs in this article. The upcoming EJB 3 specification specifically targets the development model, making it significantly easier; the specification also promises, among other things, to deliver dependency injections and easier testing outside of containers for entity beans. Instead, this article's goal is to provide an in-depth analysis of advanced options available in BEA WebLogic Server 8.1 and 9.0 that allow developers to improve the performance of CMP beans—quite significantly in many cases. The topic is very broad, which makes it impossible to cover all aspects in just one article; therefore, I focus only on concurrency and long-term caching strategies for CMP entity beans. I also briefly cover improvements available in the most recent version, BEA WebLogic Server 9.0.

Concurrency Strategies

J2EE developers know that the EJB container maintains a cache or pool of entity beans, usually configurable in the deployment descriptor. Amazingly, quite a few J2EE developers don't realize that this doesn't mean that once the J2EE server loads a particular bean instance from the database, that it won't go to database again while bean instance are kept in the pool. On the contrary, by default a J2EE server executes ejbLoad() to synchronize the instance's state from the database at the beginning of every transaction. Basically, on every operation on a CMP bean (even if the bean were loaded seconds ago in a previous transaction) a server executes an SQL select to refresh it. Only when operating with multiple entity bean instances in one transaction will the server cache them.

Obviously, reloading state from the database in every transaction could have some performance impact! This default behavior is easy to understand: It's the safest way to operate if the database is shared between multiple processes, and each one of them could change the state of persisted objects in the database while it's cached in an EJB pool. But quite often it's possible to relax this behavior a bit by telling the J2EE server to keep cached instances of entity beans between transactions, therefore gaining performance by skipping the data refresh from the database most of the time. To approach this problem and generate an optimal solution, first I'll discuss different concurrency strategies available in BEA WebLogic Server.

It's very important for an EJB developer to understand the different concurrency strategies available for entity beans. Surprisingly, not all developers are even aware that concurrency options exist. So what is a concurrency strategy applicable to entity beans? The EJB container is a highly multithreaded application, serving simultaneous requests from multiple clients, which quite often involve access to the same resource, such as a row in the database table. Therefore, the EJB container should manage concurrent access to the entity bean instances; more technically, the concurrency strategy determines how and when the container synchronizes every instance of the entity bean with the underlying database.

Four types of concurrency strategies currently are available in WebLogic Server: exclusive, database, optimistic, and read-only. By default, WebLogic Server, starting from version 7.0, uses database concurrency. Moving down this list, each strategy potentially is more performant than the previous one. I'll discuss these strategies, describing the benefits and tradeoffs each one has to offer.

Exclusive concurrency

Exclusive concurrency means that the container creates, at most, one instance of an entity bean for every primary key value (for example, one row in the table maps to one instance of EJB in the container). Access to the given instance is serialized, and requests are executed in sequential fashion, one by one. There are some serious problems with this strategy. First, performance would obviously suffer because of the serialized access to the bean from multiple clients, and you can forget about scalability of your application. Second, a single instance of the EJB (and the associated locks held by the container) will be local to one JVM (one server instance) and will not work in a cluster. This strategy exists for backward compatibility reasons (early versions of WebLogic Server used it by default) and should rarely, if ever, be used.

Database concurrency

The database concurrency strategy is the default concurrency strategy in current versions of WebLogic Server. It provides a pretty good compromise between data consistency and performance. The idea is pretty simple: Instead of trying to manage locks by itself, WebLogic Server creates a new bean instance for each transaction that wants to access that bean, and delegates concurrency control and deadlock detection to the underlying database. This is like running parallel database operations from multiple clients against a single database; the database's isolation level and locking policy will dictate which updates, selects, and inserts proceed and in which order, and what (if anything) is going to fail. The immediate advantage is good applicability of this strategy to clustered environments—as long as all nodes in the cluster share the same database, and the EJB container doesn't need to bother with data synchronization details.

This strategy obviously is more scalable than the exclusive strategy and works extremely well for certain applications, but not without some serious performance limitations. Even so, the container holds a pool of instances of entity beans, and these instances don't hold any intermediate state between transactions: This is instance pooling not caching data. The whole idea of pooling instances without state probably comes from very early JVM implementations, when object creation was an expensive operation and it was beneficial from a performance perspective to have a cache of an object's instances. This is not true in modern JVMs where object creation is fast in most cases, but because this behavior is described in the EJB specification, all vendors should support it. Nevertheless, when using the database concurrency strategy, the container pulls out "stateless" bean instance from the cache and has to execute an SQL select operation to obtain the latest data and populate instance fields.

This approach may not be bad because we don't have to worry about "stale" bean instances (when data in the database was updated either from another node from the same cluster or from a different application), but this obviously suffers from a performance penalty as well. You almost always end up with an extra select operation at the beginning of the transaction even if you're only going to update data in a bean and aren't interested in previous values. For this reason, it makes little sense to use entity beans for applications in which you mainly or exclusively need to execute updates or inserts—the container would spend a lot of time doing unnecessary selects and throwing data away.

At least one problem exists with both the exclusive and the database concurrency strategies: the possibility of lost updates. Imagine if two clients almost simultaneously try to update the same record in a table that is mapped to an entity bean. If no locks are held in the database, the result of the update operation that finished first will be overwritten by the second update. Whether this is an acceptable outcome or not depends on your business requirements. Often it's not acceptable or desirable to lose your updates; therefore, the application needs some kind of mechanism to prevent or detect a lost update condition and have the opportunity to recover. With the exclusive strategy, if your application deploys on more than one node, you don't have any control over the lost updates problem. But as I mentioned earlier, you shouldn't really consider this strategy anyway.

By delegating concurrency control to the database, the database strategy provides an option to use an exclusive lock in the database when a data read is performed (meaning when an entity bean is enlisted in transaction). This is achieved by setting the use-select-for-update element in weblogic-cmp-jar.xml to true (the default setting is false). As you can guess from the name, this action will instruct WebLogic Server to use "select for update" when loading entity bean data. The resulting database lock will be held until the transaction is completed, and therefore it's impossible for any other transaction to read or change data while the first transaction is running. This technique, possibly combining the "no wait" option on "select for update," could solve the lost updates problem as well as any possible deadlocks—but at a pretty high price.

One disadvantage to this approach is that performance degradation occurs because once one transaction has locked an entity instance, other transactions are blocked from accessing the same entity data, even those that require read-only access. This basically throws us back to a kind of exclusive concurrency strategy with the only difference being that this time it can be used in a clustered environment, because all the locking happens in a (shared) database and not on a server instance. A second disadvantage is that your beans' behavior somehow becomes dependent on the underlying database's lock policies implementation. For example, some databases support fine-grained, row-level locking, and some don't. In the latter case, access to a whole page of records could be blocked as a result of a single entity load operation.

Optimistic concurrency

The optimistic locking strategy is designed to solve performance problems by eliminating any locks in the database between read and update operations, while at the same time providing a mechanism for lost update detection. As with database concurrency, optimistic concurrency gives each transaction its own bean instance, but it doesn't hold any locks in the EJB container or in the database while the transaction is in progress. For databases that do read locking (non-Oracle databases), the container reads bean data in a separate local transaction that commits as soon as the data read completes, therefore avoiding read locks and allowing for better scalability.

This mechanism for lost update detection is nothing new and was in use for a long time before Java was created. The basic idea of this pattern is extremely simple yet powerful: At update time, a check is performed to see if the data in the table was modified by some other process after it was initially read by the first process. In practice, this is done more often by including extra "version" column(s) in the where clause of an update statement. A simplified SQL example is shown below, where a version column is numeric and incremented after each update operation:

-- at Entity load time"container executes select of bean fields, 

-- plus 'version' field

select name, address, version from person where person_id = 1;

-- container remember value read from version column, 

-- say it was 123

  

-- at update time, value of version which was read previously 

-- added to the where clause and version incremented

update person set name='Joe',  
                        
version = version + 1 

              where person_id = 1  
                        
and version = 123;
                      

When executing an update, the container will be able to detect how many rows were actually updated in the database by executing code similar to that shown below:

...

PreparedStatement ps = 

       connection.prepareStatement('update person...');

int rowsUpdated = ps.executeUpdate();

if (rowsUpdatated != 1) {

    throw new OptimisticConcurrencyException(

                'Data was modified by other process');

}

As you can see, this approach allows the server to prevent the lost updates problem without enforcing any long-term locks either in the container or in the database. The strategy works extremely well in situations where the number of reads is much higher than the number of updates in the database, and therefore the probability of update collisions are low. Typically, this is true for the majority of applications.

A numerical version column is not the only way to implement the optimistic concurrency strategy in WebLogic Server. A timestamp column could be used instead. (Note that the database should allow you to store sufficient precision of timestamp values.) Using a timestamp column gives you the additional benefit of knowing when the record was last updated. Sometimes when working with legacy database schemas, it's either undesirable or just impossible to change tables to add a version column. In this case, WebLogic Server can check all columns in the table that were read during the transaction (all fields in the entity beans) or only the columns that have been updated by the current transaction. Optimistic concurrency can be used on legacy schemas without any modification to tables, at the price of a slightly higher overhead (meaning a more complex SQL update).

Read-only concurrency

The final concurrency strategy is read-only concurrency. If some of the tables in your database hold data that is rarely or never changed, it is beneficial to declare CMP beans as read only. The server is still going to activate a new bean instance for every transaction so that requests are processed in parallel; it's not going to call ejbLoad() every time, but rather periodically, based on the value of the read-timeout-seconds parameter. This could give your application a significant performance boost because the SQL select will be performed only once, when that particular entity bean is accessed the first time, and then it is cached and reused in subsequent transactions.

One peculiar feature in WebLogic Server versions prior to 8.1 SP2 is that even if the bean had been deployed as read only, developers were allowed to call create and remove operations on instances of that bean. Starting with 8.1 SP2, this is disabled by default, but if you need this functionality, there is a way to turn it back on by setting the allow-readonly-create-and-remove element to true in weblogic-cmp.jsr.xml.

There is also a way to explicitly invalidate instances of read-only entity beans. The invalidate operation forces a bean instance refresh from the database upon the next transaction even if the read timeout is not expired. You can think about it as flushing the instance cache. It's possible to invalidate a specific instance of the particular bean, any subset of instances, or all instances of a given bean. To call the invalidate() method, you would need to cast the bean's home or local home to CachingHome or CachingLocalHome, respectively. The code below illustrates how to do this:

Object o = context.lookup("MyBeanCMP");

// cast to CachinHome for remote interface or CachingLocalHome 

// for local

CachingHome cachingHome = (CachingHome)o;

    

// invalidate particular instance by passing primary key value

cachingHome.invalidate(pk);

// invalidate any subset of bean instances by passing collections 

// of primary keys

Collection pks = Array.asList(new Object[]{pk1, pk2, ...., pkN});

cachingHome.invalidate(pks);

// or invalidate all instances currently cached

cachingHome.invalidateAll();

Explicit invalidation is useful when data in your tables are quasi-static—for example, if it changes once a day by some batch process. In this case, you could deploy your corresponding beans as read only with large read timeout values, and, after the batch process finishes, call invalidateAll() for these entity beans.

You can specify a concurrency strategy for every CMP bean by setting the concurrency-strategy element in the entity-cache stanza in the weblogic-ejb-jar.xml deployment descriptor. If the concurrency strategy is not specified, WebLogic Server uses the database concurrency by default.

Pages: 1, 2, 3

Next Page »