Technical Corner
 

Understanding and Monitoring TSNBLK and SEQBLK Stalls

Don Green
Senior Technical Analyst, Rdb Support
Oracle Corporation

Many longtime Rdb users who upgrade to Rdb7 are soon seeing new and unfamiliar stall messages. Working as a senior technical analyst in Rdb support, I've gotten many calls from DBAs who wonder whether they have a problem and who want to understand why they are seeing these new messages.

Usually, these stalls do not indicate a problem, but rather offer you information that can be useful. The purpose of this article is to explain what's going on, why you are seeing these new TSNBLK and SEQBLK stalls appearing and disappearing, and how you can monitor them.

1.0 The TSNBLK and SEQBLK Stalls

The SEQBLK data structure (RWROOT in versions prior to Oracle Rdb7) is used to allocate TSNs (transaction serial number) and CSNs (commit TSN's), and to serialize access to the TSNBLKs.

Before Oracle Rdb7, the TSN was a 32-bit integer, and so allowed up to 4,294,967,294 transactions over the life of the database. As hardware and software became faster and applications demanded more from the database server, it became apparent that some customer databases would soon exceed this limit. Therefore, in Oracle Rdb7, the TSN was enlarged to a 64-bit integer, allowing virtually an unlimited number of transactions over the life of an active database.

Due to this size increase, the number of TSN entries in a TSNBLK decreased from 50 to 28. This effectively means that Rdb7 databases have almost doubled the number of TSNBLKs.

The TSNBLK data structure is accessed by every read-only transaction that is started so that the proper runtime TSN and CSN can be determined. In a cluster database, this means lots of I/O to the rootfile.

Solution 1: To reduce the I/O and the resulting stall messages, you can reduce the "number of database users" to their actual runtime number. For example, a database configured for 2000 users requires 72 TSNBLKs (2000/28). In the worst case, every read-only transaction needs to do 72 I/Os to start. The read-only transaction takes out a SEQBLK lock while it traverses the TSNBLKs, determining its next TSN number. If you usually have only 500 users accessing the database at one time, reducing the maximum number of users to around 512 would save you 53 I/Os (72-19) per read-only transaction start. Read-write transactions get their next TSN numbers from a lock in memory. The read-write transaction will only incur a disk I/O if the value in memory has been exhausted. The lock value block of the TSN lock is incremented by 32 when this disk I/O occurs.

Sometimes, simply reducing the maximum number of users may not shorten SEQBLK stalls.

Example. Consider a database that allows access from four nodes in a cluster and has a max user number of 200. The first 75 users attach from node A. There are three TSNBLKs created for node A (28 + 28 + 19). The next 75 users attach from node B. Three more TSNBLKs are created for a total of 6. The next 35 users attach from node C, so two additional TSNBLKs are created (28 + 7). Two users attach from node D so another TSNBLK is created. There are now nine TSNBLKs for 187 users across four nodes. For every RO transaction that is started there are nine I/O's to the TSNBLKs. As the business day winds down and users start detaching from the database, the TSNBLKs do not get deleted until every user in a specific TSNBLK detaches. So if only one of the two users from node D detaches that TSNBLK still needs to be processed for every new RO transaction. Likewise, until everybody detaches from a TSNBLK, that TSNBLK must continue to exist and be processed. So, using the four node example, in the worst possible case you could have just nine users left attached, each defined in a separate TSNBLKs. That would still leave nine I/O's necessary for every new RO TSN.

Tip: For a better utilization of TSNBLK structures, try to maintain an even balance of users among cluster nodes.

Keep in mind that a maximum user number of 30 with three cluster nodes will not allow each node to have ten users attached to a database. Each TSNBLK structure holds 28 user slots. If one user attaches from node A and one user attaches from node B, then nobody can attach from node C. The two TSNBLKs contain slots for 56 users, of which only 30 can be used. Rdb will not create an additional TSNBLK, because there are already enough user slots created to hold the maximum number of users.

Solution 2: Another workaround is to implement Rdb's Commit To Journal configuration option. This option requires that AIJ is enabled, Fast Commit is enabled and snapshots be either disabled or deferred. Commit To Journal eliminates a lot of root I/O traffic. A clump of TSN's gets assigned to each database user when CTJ is enabled.

An example of the SQL syntax to enable COMMIT TO JOURNAL:

SQL> ALTER DATABASE FILENAME test1
                                    
cont> JOURNAL FAST COMMIT ENABLED
cont> (CHECKPOINT INTERVAL IS n BLCOKS,
cont> CHECKPOINT TIMED EVERY s SECONDS
cont> COMMIT TO JOURNAL OPTIMIZATION,
cont> TRANSACTION INTERVAL IS m);

The value specified for 'n' equals the number of blocks the AIJ file can increase by before checkpointing. The value specified for 's' equals the number of seconds elapsed before checkpointing. Both 'n' and 's' are FAST COMMIT parameters. The value for 'm' equals the number of TSNs you want to preassign to each user when COMMIT TO JOURNAL is enabled. You can specify either or both checkpoint parameters. The value for TRANSACTION INTERVAL IS must be between 8 and 1024. The default value is 256.

Enabling COMMIT TO JOURNAL is an offline operation. Consult the Oracle Rdb Guide to Database Performance and Tuning for more information regarding FAST COMMIT and COMMIT TO JOURNAL operations.

Stall times

Stall times should be of short duration for SEQBLK locks. If you see long stalls waiting for SEQBLK then this could be a sign that:

  • A lot of short RO transactions are occurring.
  • There is a disk I/O bottleneck on the drive where the root file resides.
  • Some application/process is blocking the processes queued for SEQBLK locks and needs to be investigated using the normal techniques like RMU/SHOW STATISTICS Active User Stall Screen or RMU/SHOW LOCK/MODE=BLOCKING.

Next Page

Top