Peak performance tuning of CMP 2.0 Entity beans in WebLogic Server 8.1 and 9.0
Pages: 1, 2, 3

Performance Improvement Strategies

Now that you're familiar with the different concurrency strategies available in WebLogic Server, I'll demonstrate how to apply this knowledge to improve performance of CMP beans. CMP beans are often criticized for having mediocre performance. To a certain extent this is true because, as discussed above, for beans deployed with the default settings for the concurrency strategy, WebLogic Server is going to read data from the database at the beginning of every new transaction. In other words, bean instance state is not cached between different transactions, and the database receives a high volume of select requests. Some people argue that this is okay because modern databases deploy effective caching mechanisms. Therefore, once it is selected the first time, data would probably stay in the database cache for subsequent calls, with little or no disc activity for selects on the database afterward. While this is true, we should not forget that in most cases the calls between application server and database travel over the network, and latency of the calls would be in order of several magnitudes higher than local calls within a JVM. An additional concern is that multiple application server instances can share access to the same database (often the case in clustered configurations). In that case, it's possible to saturate even the fastest network links between application server and database.

To make a long story short, if we need good performance and scalability, then one of the first and most effective strategies we can explore is how to cache data locally if possible, to avoid potentially costly remote calls to the database. Speaking in terms of CMP beans, this means we need a mechanism to preserve (or cache) bean state data between calls from different transactions. Of course, this would give us a performance gain only if there is a greater than zero possibility that the same bean instance will be accessed more than once in its lifetime. In other words, your application should read data more than write it, and there should be a possibility of accessing the same data multiple times between updates. For example, if your application only writes to the database (OLTP), caching of that data is not going to improve performance at all. Likewise, if you have very large tables and you select rows more or less randomly, there is a slim chance that cached data will survive long enough to be available from the cache when it is needed again. Fortunately, there is still a pretty large class of applications that satisfy our caching criteria. You need to estimate effectiveness of caching for your particular task.

Looking at the available concurrency strategies for CMP beans, notice that there are at least two approaches to implementing long-term caching. The first approach is to use read-only beans where possible. Unfortunately, more often than not, data in the database is not completely static and/or it is updated at unpredictable intervals. As I've shown earlier, there is a mechanism to explicitly invalidate the cache for any read-only CMP bean. This approach, while feasible, is far from ideal and is error-prone because developers must remember to call invalidate() after each update operation. Fortunately, WebLogic Server provides convenient implementations of the read-mostly pattern, which we will discuss in detail shortly. An even more powerful approach is to take advantage of the cache-between-transaction mechanisms available for beans deployed with the optimistic concurrency strategy.

Using the read-mostly pattern

WebLogic Server provides you with a mechanism to implement the read-mostly pattern by mapping a read-only CMP bean and a read-write CMP bean to the same data. Your application should use the read-only bean to read data and the read-write bean to modify data. The read-only bean loads data from the database at intervals specified by the read-timeout-seconds element in the deployment descriptor, as shown above. To ensure that the read-only bean always returns current data, the read-only bean should be invalidated when the read-write bean changes the data. You can configure the server to automatically invalidate the corresponding read-only bean by specifying that bean in the invalidation-target element in the entity-descriptor stanza in weblogic-ejb-jar.xml. This can only be used for CMP 2.0 entity beans. While this pattern certainly delivers a caching bonus, it has serious drawbacks. To name a couple, when using this pattern, a number of entity beans in your application would effectively be doubled, causing an impact on your application startup time and memory. Also, developers need to remember that read-only and read-write operations should use different beans, and this could be confusing at times.

As a side note, in older versions of WebLogic Server, which didn't have built-in support for the read-only pattern via invalidation-target, it was still possible to use it. If you recall, according to the EJB specification, if an EJB throws a RuntimeException or any subclass of it, the container should destroy the bean instance. Therefore, it's possible to expose such destroyMe() methods on a read-only version of an entity bean and call it from the ejbStore() method of a read-write bean. This is the famous sepukku pattern.

Caching between transactions of read/write CMP beans

An alternative and more advanced solution for long-term caching is to configure beans with the cache-between-transactions element in weblogic-ejb-jar.xml set to true. In this case, WebLogic Server will call ejbLoad() to load data from the database only when the client first references the bean, or when the transaction is rolled back.

Although theoretically you can use this for all concurrency strategies except database concurrency, in practice using it only with optimistic concurrency makes sense. When applied with read-only concurrency, this setting is simply ignored because bean data is already cached; when applied with exclusive concurrency, this is only going to work if the EJB has exclusive access to the underlying database, which is rarely the case. Moreover, when beans with exclusive concurrency are deployed in a cluster, WebLogic Server automatically disables caching between transactions because any server in the cluster may update the bean data, and there is no mechanism to synchronize or invalidate cached values across nodes in this case. This leaves us with only one viable concurrency strategy to use with long-term caching: optimistic.

As we have seen above, for beans deployed with the optimistic concurrency strategy, WebLogic Server has a built-in mechanism to detect underlying database changes through verify-columns. While optimistic concurrency by itself can give you only marginal performance improvements over database concurrency, greater improvements can be made by taking advantage of the cache between transaction functionality. Setting cache-between-transactions to true will result in WebLogic Server skipping calls to the database if the bean instance is already available in the EJB cache. For certain types of applications in which the same objects are accessed more than once from different transactions over a short period of time, this can lead to significant performance improvements (under certain circumstances, up to 30 to 40 percent). Naturally, since we're using optimistic concurrency, your application must be ready to deal with an OptimisticConcurrencyException when a concurrency violation is detected. When an OptimisticConcurrencyException (subtype of RuntimeException ) is thrown, WebLogic Server discards an EJB instance from the cache. Note that if you have delay-updates-until-end-of-tx set to true (default), then you won't get an optimistic exception until the transaction commits, which could be outside of your application code if using container-managed transactions.

In contrast to the read-mostly pattern, which doesn't provide mechanisms to notify other nodes in the cluster that data was changed on one of the nodes, when a bean with optimistic concurrency is updated, a notification is broadcast to other cluster members, and the cached bean instances are discarded to prevent optimistic conflicts. Since WebLogic Server doesn't broadcast data changes itself but only the bean identifier of some sort, this across-cluster cache invalidation is quite effective in terms of performance and network bandwidth consumption. WebLogic Server does this cache invalidation work automatically, and bean developers don't need to configure anything to make it happen. On the next request to the same bean, fresh data will be loaded from the database.

If the data in the database is updated by processes external to the server (or if you're using direct JDBC access from the server to modify the same tables as mapped to CMPs with long-term caching), then these processes should honor the contract on version columns. In other words, if entity beans are configured to use a numerical version column, then external processes should increment that value when data in the row is updated; if a timestamp column is used, then that value should be updated to the current timestamp, and so on. Failure to do so would result in WebLogic Server overwriting the data because its optimistic detection mechanism won't trigger the exception if the version data hasn't been changed. If it's impossible to modify external processes to update the version column, database triggers can be used to achieve the same effect. If modification of the database schema is not permitted, WebLogic Server could be configured to check all columns in the table that were read during the transaction or just columns that have been updated (by setting verify-columns element to Read or Modified accordingly). Note that in this case there could be a performance penalty because the resulting update SQL statement is more complicated. I recommend running tests to see how this would affect the performance of updates in your particular case.

Caching between transactions provides a better model of caching data than the read-mostly pattern discussed above. First, there are no additional complexities involved such as having to deploy two versions of the same bean, there's no impact on startup time, automatic cache invalidation upon bean change across the cluster, and so on. Until recently though, one feature was missing in WebLogic Server with regard to caching between transactions beans: There was no documented mechanism to programmatically invalidate bean caches. This isn't a big problem if you have exclusive access to the database from a server, but in many projects, this is rarely the case, and in order to flush the cache, it's necessary to restart the instance; again, this isn't always possible.

Let's take a look at what will happen if a bean is deployed with optimistic concurrency and cached between transactions when data in the database is updated by an external process. If an external process updates records currently cached by the container, and then the application updates the same rows via CMPs, there are two possible outcomes: If the external process doesn't honor the contract on updating verify-columns, then you'd have a lost update case (update from CMP overwrites whatever changes the external process has done to the record). If, on the other hand, the external process updates the version column or the bean is configured to use Read/Modified columns for optimistic control, you may have an OptimisticConcurrencyException.

In WebLogic Server, OptimisticConcurrencyException is a subclass of RuntimeException, and if the application doesn't catch it, the entity instance (as well as any session bean instances that called that entity in the same transaction) will be discarded, and the transaction is rolled back; on the next attempt, WebLogic Server will reload the data from the database, and the transaction will be completed successfully. Although lacking "beauty," this approach may work well for applications working with queues (MOM); on transaction rollback, the message will remain in the queue, and then the next redelivery attempt (if any) will succeed. It's also worth mentioning again that unless your application uses bean-managed transactions, you won't have a chance to catch an OptimisticConcurrencyException unless you set delay-updates-until-end-of-tx to false (not a default). With the default setting, WebLogic Server won't execute actual DML operations in the database until the commit, and the operation will fail with RollbackException (inside which would be the mention of the actual OptimisticConcurrencyException).

The following Cactus-based test demonstrates this behavior. Assume that a Person CMP bean is deployed with optimistic concurrency, and cache-between-transaction is set to true:

...
public class OptimisticLockingTest extends ServletTestCase {
  private UserTransaction tx;

  protected void setUp() throws Exception {
    super.setUp();
    Context context = new InitialContext();
    this.tx = (UserTransaction)context
        .lookup("javax/transaction/UserTransaction");        
  }

  public void testCacheBetweenTransactions() throws Exception {
    PersonLocalHome localHome = (PersonLocalHome)Locator
        .getLocalHome(PersonLocalHome.JNDI_NAME);

    // create record via CMP in first transaction
    this.tx.begin();
    PersonLocal local = localHome.create();
    local.setName("John");
    Long pk = local.getPrimaryKey();
    this.tx.commit();

    // update some field(s) via direct JDBC call in another 
    // transaction.  Assume that updatePersonNameViaJdbc() 
    // method will update version column as well
    String newName = "Paul";
    this.tx.begin();
    updatePersonNameViaJdbc(pk, newName);
    this.tx.commit();

    // find CMP again and try to update name in yet
    // another transaction
    this.tx.begin();
    local = localHome.findByPrimaryKey(pk);

    // code doesn't see changes made directly 
    // because bean instance was cached between transactions
    assertFalse(newName.equals(local.getName());

    try {
      // this should throw optimistic concurrency 
      // exception (on commit)
      local.setName("George");
      this.tx.commit();
      fail("expected OptimisticConcurrencyException not thrown");
    }
    catch (RollbackException expected) {
       // unfortunately there seems to be no better way to 
       // assert that underlying exception was 
       // an OptimisticConcurrencyException
       assertTrue("Unexpected exception type: "+expected, 
             expected.getMessage()
             .indexOf("Optimistic concurrency violation") > 0);
    }
  }
}
...

I hope you agree that it would be nice to have control over invalidation of the long-term cache here. As it turned out, there is a solution even for WebLogic versions 7.0 and 8.1. Similar to the CachingHome/CachingLocalHome interfaces that can be used to flush the cache for read-only beans, an EntityEJBHome and an EntityEJBLocalHome, with pretty much the same family of invalidate() methods, allow an application to invalidate either the full cache for a particular entity bean or any subset of that cache. Any CMP local interface in WebLogic Server can be cast to EntityEJBLocalHome. Using the previous example, we can insert the following code right after the call to updatePersonNameViaJdbc() method:

...
// flush cache
assertTrue("PersonLocalHome not instance of EntityEJBLocalHome: "+
      localHome, localHome instanceof EntityEJBLocalHome);
EntityEJBLocalHome entityEJBLocalHome = 
                                    (EntityEJBLocalHome)localHome;
entityEJBLocalHome.invalidate(pk);
...

Now on the next findByPrimaryKey() call, the bean instance will be reloaded from the database, and everything will work much better. In addition to the invalidate() method, there are also invalidateAll() and invalidate(Collection) methods.

Improvements in WebLogic Server 9.0

In WebLogic Server 9.0, explicit cache invalidation for cached beans with optimistic concurrency is documented and made consistent with read-only beans. (For example, the bean home or remote home interface can be cached into CachingHome or CachingLocalHome and invalidate() methods called as shown above.) Moreover, the read-timeout-seconds parameter is applicable to beans deployed with the optimistic concurrency. Developers also have more control over bean instance invalidation in the cluster. By default, when a bean that has an optimistic concurrency strategy is deployed in a cluster, and a member of the cluster updates the bean, then WebLogic Server attempts to invalidate all copies of the bean in all nodes in the cluster. This invalidation allows you to avoid optimistic concurrency failures, but it can be a drain on performance because it is a resource-intensive operation. Invalidation can be disabled by setting cluster-invalidation-disabled in weblogic-cmp-jar.xml to true to prevent the EJB container from attempting to invalidate copies of the bean across the cluster.

Pages: 1, 2, 3

Next Page »