Understanding and Monitoring TSNBLK and SEQBLK Stalls
Many longtime Rdb users who upgrade to Rdb7 are soon seeing new and unfamiliar stall messages. Working as a senior technical analyst in Rdb support, I've gotten many calls from DBAs who wonder whether they have a problem and who want to understand why they are seeing these new messages.
Usually, these stalls do not indicate a problem, but rather offer you information that can be useful. The purpose of this article is to explain what's going on, why you are seeing these new TSNBLK and SEQBLK stalls appearing and disappearing, and how you can monitor them.
1.0 The TSNBLK and SEQBLK Stalls
The SEQBLK data structure (RWROOT in versions prior to Oracle Rdb7) is used to allocate TSNs (transaction serial number) and CSNs (commit TSN's), and to serialize access to the TSNBLKs.
Before Oracle Rdb7, the TSN was a 32-bit integer, and so allowed up to 4,294,967,294 transactions over the life of the database. As hardware and software became faster and applications demanded more from the database server, it became apparent that some customer databases would soon exceed this limit. Therefore, in Oracle Rdb7, the TSN was enlarged to a 64-bit integer, allowing virtually an unlimited number of transactions over the life of an active database.
Due to this size increase, the number of TSN entries in a TSNBLK decreased from 50 to 28. This effectively means that Rdb7 databases have almost doubled the number of TSNBLKs.
The TSNBLK data structure is accessed by every read-only transaction that is started so that the proper runtime TSN and CSN can be determined. In a cluster database, this means lots of I/O to the rootfile.
Solution 1: To reduce the I/O and the resulting stall messages, you can reduce the "number of database users" to their actual runtime number. For example, a database configured for 2000 users requires 72 TSNBLKs (2000/28). In the worst case, every read-only transaction needs to do 72 I/Os to start. The read-only transaction takes out a SEQBLK lock while it traverses the TSNBLKs, determining its next TSN number. If you usually have only 500 users accessing the database at one time, reducing the maximum number of users to around 512 would save you 53 I/Os (72-19) per read-only transaction start. Read-write transactions get their next TSN numbers from a lock in memory. The read-write transaction will only incur a disk I/O if the value in memory has been exhausted. The lock value block of the TSN lock is incremented by 32 when this disk I/O occurs.
Sometimes, simply reducing the maximum number of users may not shorten SEQBLK stalls.
Keep in mind that a maximum user number of 30 with three cluster nodes will not allow each node to have ten users attached to a database. Each TSNBLK structure holds 28 user slots. If one user attaches from node A and one user attaches from node B, then nobody can attach from node C. The two TSNBLKs contain slots for 56 users, of which only 30 can be used. Rdb will not create an additional TSNBLK, because there are already enough user slots created to hold the maximum number of users.
Solution 2: Another workaround is to implement Rdb's Commit To Journal configuration option. This option requires that AIJ is enabled, Fast Commit is enabled and snapshots be either disabled or deferred. Commit To Journal eliminates a lot of root I/O traffic. A clump of TSN's gets assigned to each database user when CTJ is enabled.
An example of the SQL syntax to enable COMMIT TO JOURNAL:
SQL> ALTER DATABASE FILENAME test1
The value specified for 'n' equals the number of blocks the AIJ file can increase by before checkpointing. The value specified for 's' equals the number of seconds elapsed before checkpointing. Both 'n' and 's' are FAST COMMIT parameters. The value for 'm' equals the number of TSNs you want to preassign to each user when COMMIT TO JOURNAL is enabled. You can specify either or both checkpoint parameters. The value for TRANSACTION INTERVAL IS must be between 8 and 1024. The default value is 256.
Enabling COMMIT TO JOURNAL is an offline operation. Consult the Oracle Rdb Guide to Database Performance and Tuning for more information regarding FAST COMMIT and COMMIT TO JOURNAL operations.
Stall times should be of short duration for SEQBLK locks. If you see long stalls waiting for SEQBLK then this could be a sign that: