How Much Longer?: Rollback Monitoring
Give users an accurate estimate of the duration of a rollback operation
Are we there yet? How much longer?
Sound familiar? These questions may come from the back seat on your way to the kids' favorite theme park, often incessantly and with increasing frequency. Wouldn't you want to tell them exactly how much longer it will takeor better yet, know the answer yourself?Similarly, when a long, running transaction has been rolled back, there are often several users breathing down your neck asking the same questions. The questions are justified, because the transaction holds the locks and normal processing often suffers as the rollback progresses. In Oracle 9i Database and below, you can issue the query
SELECT USED_UREC FROM V$TRANSACTION;which returns the number of undo records used by the current transaction, and if executed repeatedly, will show continuously reduced values because the rollback process will release the undo records as it progresses. You can then calculate the rate by taking snapshots for an interval and then extrapolate the result to estimate the finishing time. Although there is a column called START_TIME in the view V$TRANSACTION, the column shows only the starting time of the entire transaction (that is, before the rollback was issued). Therefore, extrapolation aside, there is no way for you to know when the rollback was actually issued. Extended Statistics for Transaction Rollback In Oracle Database 10g , this exercise is trivial. When a transaction rolls back, the event is recorded in the view V$SESSION_LONGOPS, which shows long running transactions. For rollback purpose, if the process takes more than six seconds, the record appears in the view. After the rollback is issued, you would probably conceal your monitor screen from prying eyes and issue the following query:
select time_remaining from v$session_longops where sid = < sid of the session doing the rollback>;Now that you realize how important this view V$SESSION_LONGOPS is, let's see what else it has to offer. This view was available pre-Oracle Database 10g, but the information on rollback transactions was not captured. To show all the columns in a readable manner, we will use the PRINT_TABLE function described by Tom Kyte at AskTom.com. This procedure simply displays the columns in a tabular manner rather than the usual line manner.
SQL> set serveroutput on size 999999 SQL> exec print_table('select * from v$session_longops where sid = 9') SID : 9 SERIAL# : 68 OPNAME : Transaction Rollback TARGET : TARGET_DESC : xid:0x000e.01c.00000067 SOFAR : 10234 TOTALWORK : 20554 UNITS : Blocks START_TIME : 07-dec-2003 21:20:07 LAST_UPDATE_TIME : 07-dec-2003 21:21:24 TIME_REMAINING : 77 ELAPSED_SECONDS : 77 CONTEXT : 0 MESSAGE : Transaction Rollback: xid:0x000e.01c.00000067 : 10234 out of 20554 Blocks done USERNAME : SYS SQL_ADDRESS : 00000003B719ED08 SQL_HASH_VALUE : 1430203031 SQL_ID : 306w9c5amyanr QCSID : 0Let's examine each of these columns carefully. There may be more than one long running operation in the sessionespecially because the view contains the history of all long running operations in previous sessions. The column OPNAME shows that this record is for "Transaction Rollback," which points us in the right direction. The column TIME_REMAINING shows the estimated remaining time in seconds, described earlier and the column ELAPSED_SECONDS shows the time consumed so far. So how does this table offer an estimate of the remaining time? Clues can be found in the columns TOTALWORK, which shows the total amount of "work" to do, and SOFAR, which shows how much has been done so far. The unit of work is shown in column UNITS. In this case, it's in blocks; therefore, a total of 10,234 blocks have been rolled back so far, out of 20,554. The operation so far has taken 77 seconds. Hence the remaining blocks will take: 77 * ( (20554-10234)/10234 ) ≈ 77 seconds But you don't have to take that route to get the number; it's shown clearly for you. Finally, the column LAST_UPDATE_TIME shows the time as of which the view contents are current, which will serve to reinforce your interpretation of the results. SQL Statement Another important new piece of information is the identifier of the SQL statement that is being rolled back. Earlier, the SQL_ADDRESS and SQL_HASH_VALUE were used to get the SQL statement that was being rolled back. The new column SQL_ID corresponds to the SQL_ID of the view V$SQL as shown below:
SELECT SQL_TEXT FROM V$SQL WHERE SQL_ID = < value of SQL_ID from V$SESSION_LONGOPS>;This query returns the statement that was rolled back, thereby providing an additional check along with the address and hash value of the SQL statement. Parallel Instance Recovery If the DML operation was a parallel operation, the column QCSID shows the SID of the parallel query server sessions. In the event of a parallel rollback, such as during instance recovery and subsequent recovery of a failed transaction, this information often comes in handy. For example, suppose that during a large update the instance shuts down abnormally. When the instance comes up, the failed transaction is rolled back. If the value of the initialization parameter for parallel recovery is enabled, the rollback occurs in parallel instead of serially, as it occurs in regular transaction rollback. The next task is to estimate the completion time of the rollback process. The view V$FAST_START_TRANSACTIONS shows the transaction(s) occurring to roll-back the failed ones. A similar view, V$FAST_START_SERVERS, shows the number of parallel query servers working on the rollback. These two views were available in previous versions, but the new column XID, which indicates transaction identifier, makes the joining easier. In Oracle9i Database and below, you would have had to join the views on three columns (USN - Undo Segment Number, SLT - the Slot Number within the Undo Segment, and SEQ - the sequence number). The parent sets were shown in PARENTUSN, PARENTSLT, and PARENTSEQ. In Oracle Database 10g, you only need to join it on the XID column and the parent XID is indicated by an intuitive name: PXID. The most useful piece of information comes from the column RCVSERVERS in V$FAST_START_TRANSACTIONS view. If parallel rollback is going on, the number of parallel query servers is indicated in this column. You could check it to see how many parallel query processes started:
select rcvservers from v$fast_start_transactions;If the output shows just 1, then the transaction is being rolled back serially by SMON process--obviously an inefficient way to do that. You can modify the initialization parameter RECOVERY_PARALLELISM to value other than 0 and 1 and restart the instance for a parallel rollback. You can then issue ALTER SYSTEM SET FAST_START_PARALLEL_ROLLBACK = HIGH to create parallel servers as much as 4 times the number of CPUs. If the output of the above query shows anything other than 1, then parallel rollback is occurring. You can query the same view (V$FAST_START_TRANSACTIONS) to get the parent and child transactions (parent transaction id - PXID, and child - XID). The XID can also be used to join this view with V$FAST_START_SERVERS to get additional details. Conclusion In summary, when a long-running transaction is rolling back in Oracle Database 10gbe it the parallel instance recovery sessions or a user issued rollback statementall you have to do is to look at the view V$SESSION_LONGOPS and estimate to a resolution of a second how much longer it will take. Now if only it could predict the arrival time at the theme park!