Overview of TopLink Caching and Locking
By Gordon Yorke and Darren Melanson
Database calls are one of the most expensive operations executed by J2EE applications
and inefficient database access inevitably results in poor performance. Caching
is a very powerful mechanism that can optimize performance by reducing calls
to the database. However there's a fine balance between the performance benefits
of caching and the consequences of having stale data.
TopLink provides several caching options that are configurable at a class-level
to provide maximum flexibility. It also provides rich locking and refreshing
mechanisms to address data integrity while leveraging the performance benefits
of caching.
Configuring your cache to optimize performance and
manage stale state involves addressing concurrency protection using locking,
appropriate cache configuration, and selective refreshing.
Concurrency Protection Using Locking
Any time multiple clients, or applications, are reading and
writing to the same database, stale data is an issue. Caching can increase the
likelihood of stale data but TopLink provides several locking options to manage
concurrency. Locking prevents updating an object based on stale data. In an
application where concurrent modification of data is possible a locking strategy
is essential.
Locking - Pessimistically
Pessimistic locking is the most restrictive
form of locking but guarantees no changes are performed on the data during your
transaction. The database physically locks the row upon a select (SELECT . . . FOR
UPDATE [NOWAIT]) and prevents others from altering that row.
This reassurance comes at a cost as
pessimistic locks cannot be held across transactions and only one user can access
the underlying data. Pessimistic locking should be used carefully as it limits
concurrent access to the data and may cause deadlocks. In TopLink the following
API is used to acquire a pessimistic lock within a transaction:
unitOfWork.refreshAndLockObject(Object
objectToLock, short lockMode)
Lock mode is one of the following:
ObjectLevelReadQuery.NO_LOCK, LOCK or LOCK_NOWAIT
Locking - Optimistically
Optimistic locking permits all users
to read and attempt to update the same data, concurrently. It does not prevent
others from changing the same data, but it does guarantee the database will
not be updated based on stale data.
During an update attempt, optimistic locking strategies detect
if any changes have been made since the data was read and if so the update fails
and an exception is returned. The client application can then determine how
to address the conflict according to its business rules.
Optimistic locking is implemented at the database and DBAs
have many different designs and strategies. TopLink provides complete flexibility
for application developers by supporting multiple options for optimistic locking,
which are all easily configured in the TopLink Workbench or JDeveloper.
|
Locking Policy
|
When to Use
|
How it Works
|
|
Version
|
When dedicated numeric field is available
on table
|
Versions are compared and incremented on successful
modification.
|
|
Timestamp
|
When dedicated timestamp field is available on table
|
Versions are compared and set to current time (of JVM
or database) on successful modification.
|
|
All Fields
|
When a version field is not an option and you have great
variety in what fields are changed by the application.
|
Compares if any fields have changed.
|
|
Changed Fields
|
When a version field is not an option and the application
typically changes the same fields
|
Compares if any of the modified fields have been
changed.
|
|
Selected Fields
|
When a version field is not an option and a specific
set of fields optimizes the locking comparison.
|
Compares if any of the specified fields have
changed
|
Version/Timestamp Locking
The version or the timestamp locking policies are used when
a dedicated version field exists in the database table. When an object is updated,
the policy will write the new version (i.e. from 5 to 6 or a fresh timestamp)
to the database and compare the old version to the one in the table. A mismatch
of version numbers is an obvious locking conflict and TopLink will throw an
OptimisticLockException. This exception should be caught when committing a UnitOfWork.
These policies allow the version to be stored in the cache
or mapped in the object. It is recommended that the version be stored in the
object if the object is to be serialized to another tier or presented in a disconnected
client, such as a browser. Keeping the version number with the data of the object
will allow for a stateless application to function across multiple server instances
while obeying the version locking policy of the database.
Even if pessimistic locking is being
used it is strongly recommended that an optimistic locking policy be used as
well. This allows for situations where not all application use cases require
pessimistic locking. It also addresses the case where a disconnected client
operation spans the life cycle of the pessimistic lock.
Through the use of API on the UnitOfWork
these locking policies can verify version on associated objects as well.
unitOfWork.forceUpdateToVersionField(Object
myObj, Boolean updateVersion)
Non-Version Locking
In some databases no dedicated version field exists nor can
be added. In these situations the AllFieldsLockingPolicy, SelectedFieldsLockingPolicy
or ChangedFieldsLockingPolicy are used.
The AllFieldLockingPolicy sends all of the mapped fields for
every update or delete to the database for verification; potentially an enormous
amount of data.
The ChangedFieldsLockingPolicy sends only the fields for the
attributes that were changed. The ChangedFieldsLockingPolicy does not protect
from other clients that may have changed other fields.
The SelectedFieldsLockingPolicy sends of the explicitly specified
fields for verification.
None of these policies provide support
for verifying version when relationships change.
Cache Configuration
TopLink's cache choices leverage Java's built in garbage collection
and object reference types, with each cache option type utilizing a particular
Java reference type. Caching is configured at the class level in TopLink, this
allows developers granular control based on the type of data encapsulated by
each class and not on the application as a whole. When choosing cache types
and sizes developers need to consider the data usage. Factors to consider are:
- Volatility
- Volume
- Amount of sharing between clients
- Application lifecycle
- Relationships between objects
A potential drawback of caching is
overloading the middle-tier. It is important to choose the right type of cache
and where applicable, to set the appropriate target size.
|
Cache Type
|
Usage
|
Size and Growth
|
|
Soft-Cache-Weak
Hard-Cache-Weak
(default)
|
Read-mostly
Shared
|
These cache types hold up to the provided size as the
quantity of cached objects in soft/hard references based on most recently
used. All others will be pushed into weak references for garbage collection
when they are no longer references by the application.
The hard reference option is provided for JVM's where
soft references are collected too aggressively.
|
|
Weak
|
Write-mostly
Minimal Sharing
|
Holds all objects currently in use in the application
relying on JVM garbage collection to remove cached objects held by weak
references when they are no longer being used.
|
|
Full
|
Read-only/mostly
Shared
|
Contains all objects read. The size determines the initial
size of the identity map and thus the hashing efficiency.
|
|
None
|
Read-Only
|
Caches no objects and does not maintain object identity.
Should only be used for unrelated, highly volatile objects.
|
Using no identity map will not eliminate
caching issues. No identity map eliminates the ability to manage identity and
resolve relationships. If modifiable objects change frequently outside of the
application's control it is best to use the weak identity map and possibly object
refreshing as well.
Example - Auction House
It is important to understand when
and why each of the cache types should be used. To illustrate this lets consider
an application like an auction house, containing the following objects:
- Item - Thousands of items in the database viewed by many clients.
- ItemCategory -Fixed number of categories to facilitate finding items
- User - Lives for the life of the client's web session
- Bid - Hundreds of thousands regularly created, deleted and updated.
- ShippingAgent - Fixed and static -ie: UPS, FedEx, US Post Office
Item - Hard/SoftCacheWeakIdentityMap
Items may be accessed by numerous
clients, so to increase a chance of a cache hit, SoftCacheWeakIdentityMaps are
recommended. This keeps a minimum number of objects in the cache based on the
configured size of the cache and the amount of free memory available.
Some VMs have overly aggressive garbage collectors
and database hits may occur for things that seemly should be cached. In these
cases, use HardCacheWeakIdentityMap that utilizes a stronger reference to ensure
objects stay in memory.
ItemCategory - FullIdentityMap
The use of a FullIdentityMap allows
for all read categories to be cached. A full cache is used since the quantity
of categories is fixed and these are primarily read-only.
It is important to make sure that
types like this do not have relationships to objects cached in weaker types. Object types stored in a FullIdentityMap cache holding references to types in
a weak cache effectively make the weak cached objects held indefinitely.
User - WeakIdentityMap
This cache type uses weak references, which will be garbage
collected (i.e. removed from the cache) once the application references have
ended. It should be used when the classes are long lived in the application
and not shared among numerous clients such as a User.
Bid - WeakIdentityMap
Weak caches are also appropriate for data that is regularly
updated, have short lifecycles and not accessed repeatedly by numerous clients.
Bids are unique to a client and new ones are created regularly.
ShippingAgent - FullIdentityMap
There are a fixed number of ShippingAgents
and they rarely change. A FullIdentityMap is suitable for this type of static
data. The ShippingAgent class represents data that does not change and is limited
in volume so that it can be completely loaded into cache to avoid database access
entirely.
Refreshing
To minimize the chance of using stale
cached data, a refresh from the database can be forced.If a client already has
a reference to an object and wants to immediately refresh the latest version
from the database it can call:
session.refreshObject(myObj);
A TopLink query can be defined and
configured to refresh the data. For example:
ReadObjectQuery
roq = new ReadObjectQuery(Flight.class);
ExpressionBuilder builder = roq.getExpressionBuiler();
raq.setSelectionCriteria(builder.get("flightNumber").equal("UA
755"));
// Forces TopLink to refresh
with data from database
roq.refreshIdentityMapResult();
The advantage of defining a query in this manner is that it can be re-used
to refresh data on a regular interval.
Leveraging Optimistic Locking
If you are using version or timestamp
optimistic locking it is possible to optimize refreshing. By enabling descriptor.onlyRefreshIfNewerVersion()
the refresh operation will compare the version retrieved from the database and
only if the database version is more recent will it update the cached object.
Cascading Refresh
Refreshing can also be cascaded to
associated objects. There are various cascade options. This more advanced topic
is not covered here but is included in the TopLink documentation.
Summary
Configuring an object-relational cache
to work optimally with your application requires good knowledge of your application
domain model and usage. As this is better understood the configuration of the
cache should be adjusted. The proper configuration and support of locking, cache,
and refreshing queries that have been discussed here are the essential elements
needed to build efficient J2EE applications.
Beyond these capabilities TopLink
also offers cache coordination where changes made in one node can be synchronized,
replicated, or invalidated across multiple nodes of the same application forming
a cluster or grid. This capability is intended to allow developers to expand
caching benefits beyond a single node. This subject is the topic for another
paper.
Author Bios
Gordon Yorke is a Principal Software Developer on the TopLink
team. Darren Melanson is a Technical Solutions Architect at Oracle
Corporation
|