One common use case for Oracle Coherence clustering is to manage user sessions (conversational state) in the cluster. This capability is provided by the Coherence*Web module, which is a built-in feature of Oracle Coherence. Coherence*Web provides linear scalability for HTTP Session Management in clusters of hundreds of production servers. It can achieve this linear scalability because at its core it is built on Oracle Coherence dynamic partitioning.
Oracle Coherence*Web provides dramatic performance improvement, particularly for deployments that utilize large sessions and aggressive session expiry. In addition Coherence*Web includes a new optimistic session locking model that significantly increases throughput of web applications. Coherence*Web also includes new support for Coherence*Extend deployments, which isolates an application server tier from the session management tier, and results in improved availability for the system as a whole.
Oracle Coherence includes native integration of Oracle Coherence*Web Session Management Module for Oracle WebLogic Server, Oracle WebLogic Portal, and Glassfish Server, enabling distributed HTTP session management across multiple applications and heterogeneous environments while also allowing more data to be stored within sessions. Oracle Coherence Session Provider for the Microsoft .NET Framework supports session management for Microsoft .NET Framework-based applications
Session management highlights the scalability problem that typifies shared data sources: If an application could not share data across the servers, it would have to delegate that data management entirely to the shared store, which is typically the application's database. If the HTTP session were stored in the database, each HTTP request (in the absence of sticky load-balancing) would require a read from the database, causing the desired reads-per-second from the database to increase linearly with the size of the server cluster. Further, each HTTP request causes an update of its corresponding HTTP session, so regardless of sticky load balancing, to ensure that HTTP session data is not lost when a server fails the desired writes-per-second to the database will also increase linearly with the size of the server cluster. In both cases, the actual reads and writes per second that a database is capable of does not scale in relation to the number of servers requesting those reads and writes, and the database quickly becomes a bottleneck, forcing availability, reliability (e.g. asynchronous writes) and performance compromises. Additionally, related to performance, each read from a database has an associated latency, and that latency increases dramatically as the database experiences increasing load.
Coherence*Web, on the other hand, has the same latency in a 2-server cluster as it has in a 200-server cluster, since all HTTP session read operations that cannot be handled locally (e.g. locality as the result of to sticky load balancing) are spread out evenly across the rest of the cluster, and all update operations (which must be handled remotely to ensure survival of the HTTP sessions) are likewise spread out evenly across the rest of the cluster. The result is linear scalability with constant latency, regardless of the size of the cluster.
Lastly, partitioning supports linear scalability of both data capacity and throughput. It accomplishes the scalability of data capacity by evenly balancing the data across all servers, so four servers can naturally manage two times as much data as two servers. Scalability of throughput is also a direct result of load-balancing the data across all servers, since as servers are added, each server is able to utilize its full processing power to manage a smaller and smaller percentage of the overall data set. For example, in a ten-server cluster each server has to manage 10% of the data operations, and – since Oracle Coherence uses a peer-to-peer architecture – 10% of those operations are coming from each server. With ten times that many servers (i.e. 100 servers), each server is managing only 1% of the data operations, and only 1% of those operations are coming from each server – but there are ten times as many servers, so the cluster is accomplishing ten times the total number of operations! In the 10-server example, if each of the ten servers was issuing 100 operations per second, they would each be sending 10 of those operations to each of the other servers, and the result would be that each server was receiving 100 operations (10x10) that it was responsible for processing. In the 100-server example, each would still be issuing 100 operations per second, but each would be sending only one operation to each of the other servers, so the result would be that each server was receiving 100 operations (100x1) that it was responsible for processing. This linear scalability is made possible by modern switched network architectures that provide backplanes that scale linearly to the number of ports on the switch, providing each port with dedicated fully-duplexed (upstream and downstream) bandwidth. Since each server is only sending and receiving 100 operations (in both the 10-server and 100-server examples), the network bandwidth utilization is roughly constant per port regardless of the number of servers in the cluster.