Beginning Performance Tuning: Active Session HistoryBy Arup Nanda
Identify the root cause of performance issues at a precise point in the past—even when the sessions have disconnected.
In my previous two articles on the topic of performance tuning (bit.ly/SEzR6t and bit.ly/U7U5IJ), you learned how to identify the cause of performance bottlenecks occurring in real time and view the summary of what happened in the past. Although these processes provide information to help resolve performance issues in many cases, there are times when you would like to know precisely what happened at a specific point in the past. In this article, you will learn how to identify the precise sequence of events in the past in a historical—not current—session.
To run the examples in this article, execute the setup.sql script. The script creates a user called ARUP and creates three tables: TEST1, TEST2, and TEST3. In the very rare event that a user called ARUP already exists, modify the script to create another user with a different name and replace all occurrences of ARUP with that name. The sample database created by setup.sql will take up less than 200 MB of space in your database.
After running the setup.sql script, open four SQL*Plus sessions, connected as ARUP. In three of these sessions, execute the test1.sql, test2.sql, and upd1.sql scripts, respectively. Here is an example of one session running upd1.sql on a UNIX-based system:
# sqlplus arup/arup SQL> @upd1.sql
The test1.sql and test2.sql scripts are designed to consume significant amounts of CPU, so the first two sessions will be slow due to CPU contention. These sessions will run for a very long time.
After all the sessions have waited for about 10 minutes, stop the execution in sessions 1 and 2 by pressing Control-C to exit each SQL*Plus session. In session 3, enter commit and press Enter. This will release the lock on the row, and you will see session 4 show “1 row updated.” Exit SQL*Plus sessions 3 and 4. Now all the sessions connected as ARUP are closed—and history.
In the setup, I showed you how to simulate performance issues in three of the four sessions. If these were typical business applications, the applications would have showed signs of slowdown, inviting the ire of the respective application owners and users and bringing unwanted attention to you, the DBA. Now that you have that attention, what should you do next?
In my previous two performance tuning articles, you learned how to find the clues to the sources of performance issues in sessions. Unfortunately, those techniques will be of no help in this case. The V$SESSION view shows the reason for a performance issue in a session, but the session that caused the issue in this article is no longer present. The other important view, V$SESSION_EVENT, shows all the wait events waited for by the session, but, as with the V$SESSION view, it will show the data only if the session is still active in the database. The view V$SESSTAT shows the resource usage for a session and can offer clues to which sessions have consumed what amount of a resource such as redo or memory. However, because all the sessions that experienced performance issues are historical, looking into these views will not help resolve the issues. You need to determine the specific events that occurred at a point in time in historical sessions. If you had enabled tracing, the trace files would have shown the events, but you did not expect these issues in advance and didn’t enable tracing, because tracing would have degraded performance. So how do you now identify the cause of the performance issues suffered by these old sessions?
Active Session History
Fortunately, looking at performance issues in old sessions is easy with an Oracle Database feature called Active Session History. Note that the use of Active Session History requires Oracle Diagnostics Pack, a licensed option of Oracle Database available since Oracle Database 10g Release 1.
Every second, Active Session History polls the database to identify the active sessions and dumps relevant information about each of them—such as the user ID, state, the machine it is connected from, and the SQL it is executing—into a special area in the system global area (SGA) of the database instance called the ASH buffer. So even though a session is no longer present in the database instance, the ASH buffer has captured its information. In addition, because Active Session History records activities every second, it can show a second-by-second snapshot of the activities in a session. In other words, Active Session History can show a movie of the activities of the session instead of a single picture. (Note that when the ASH buffer is filled, the data is written to disk and the snapshots are taken every 10 seconds rather than every second.)
You can examine the contents of the ASH buffer in a view named V$ACTIVE_SESSION_HISTORY. Here are a few of the important columns of the V$ACTIVE_SESSION_HISTORY view:
SAMPLE_ID. The unique identifier of the Active Session History record.
SAMPLE_TIME. When Active Session History captured this data on all active sessions.
USER_ID. The numerical user ID (not the username) of the database user who created this session.
SESSION_ID. The session ID (SID) of the session.
SESSION_STATE. The state the session was in when Active Session History took the sample. It shows WAITING if the session was waiting for something; otherwise, it shows ON CPU to indicate that the session was doing productive work.
EVENT. If the session was in a WAITING state (in the SESSION_STATE column), this column will show the wait event the session was waiting for.
TIME_WAITED. If the session was in a WAITING state, this column will show how long it had been waiting when Active Session History took the sample.
WAIT_TIME. If the session is doing productive work—not in a WAITING state—this column will show how long the session waited for the last wait event.
SQL_ID. The ID of the SQL statement the session was executing at the time the sample was taken.
SQL_CHILD_NUMBER. The child number of the cursor. If this is the only version of the cursor, the child number will be 0.
Knowing the meaning of the V$ACTIVE_SESSION_HISTORY columns, you can identify what it was that historical sessions were waiting for. To begin the identification, you need to pose two questions to the application owners or users executing the SQL statements that experienced slow performance:
Because you ran the setup script as the user ARUP, the answer to the first question is ARUP. Next, you need to find out the USER_ID of the ARUP user by issuing the following SQL:
select user_id from dba_users where username = 'ARUP'; USER_ID ————————————— 92
Let’s examine the first row of the output. It shows that the session identified by SESSION_ID 39 was waiting for the “enq: TX - row lock contention” event on 29-SEP-12 at 04.55.02.379 PM. Because the session was in a WAITING state, the value of the WAIT_TIME column is irrelevant, so it shows up as 0. Because the session was still in a WAITING state when Active Session History captured the state, the TIME_WAITED column shows 0. When the session finally got the lock, it could do what it had to do and stopped waiting. At that point, the total time the session had waited was updated in Active Session History, shown in the first boldface line in the Listing 1 output (sample time 29-SEP-12 05.16.52.078): 1,310,761,160 microseconds (shown in the TIME_WAITED column), or about 22 minutes. This is such an important property of Active Session History that I repeat: Seeing 0 in the WAIT_TIME column does not mean that the session didn’t wait at all. It simply means that the session was waiting for some event for more than one second because the previous WAIT_TIME and TIME_WAITED column values showed 0. You should look at the last occurrence of the wait event (the EVENT column value) for that session in Active Session History to determine what the total wait time really was.
When you explain to your user that the cause of the delay was an unavailable lock during the period 04.55.02.379 PM to 05.16.52.078 PM, that person might ask you what SQL statement the session was executing at that time. That’s extremely easy to get from the Listing 1 output: the session with SESSION_ID 39 was executing the SQL statement with SQL_ID fx60htyzmz6wv and child number (CH#) 0.
You can get that SQL statement text with this query:
select SQL_TEXT from v$sql where sql_id = 'fx60htyzmz6wv'; SQL_TEXT ———————————————————————————————————————————————— update test1 set status = 'D' where object_id = :b1
Code Listing 3: Getting specific row information
select owner||'.'||object_name||':'||nvl(subobject_name,'-') obj_name, dbms_rowid.rowid_create ( 1, o.data_object_id, row_wait_file#, row_wait_block#, row_wait_row# ) row_id from v$session s, dba_objects o where sid = &sid and o.data_object_id = s.row_wait_obj# OBJ_NAME ROW_ID ————————————— ——————————————————— ARUP.TEST1:- AAAdvSAAHAAABGPAAw
After uncovering the root cause of the slowness in session 39, you now turn your attention to session 44. Revisiting the output in Listing 1, you can see that session 44 switched between waiting (shown under SESSION_STATE as WAITING) and doing productive work (shown as ON CPU). Note the very first line in the output for the session with SESSION_ID 44 and SAMPLE_TIME 29-SEP-12 04.55.34.419 PM. The SESSION_STATE column shows WAITING, which means that the session was waiting at that time. The EVENT and TIME_WAITED columns show “resmgr:cpu quantum” and “109984,” respectively. This means that the session had already waited for 109,984 microseconds, or about 0.11 seconds, at that time for the “resmgr:cpu quantum” event. The next line in Listing 1, sampled about a second later, shows the SESSION_STATE column value as ON CPU, which means that the session was doing productive work at that time—not waiting. You need to know why the session was intermittently waiting for this wait event and therefore slowing down.
The “resmgr:cpu quantum” event is due to Oracle Database’s Database Resource Management feature. Database Resource Management acts as a resource governor: it limits CPU consumption of individual sessions when the total CPU demand from all sessions goes up to more than 100 percent, and it enables more-important sessions to get the CPU they need. Because the output shows the session waiting, you can conclude that the CPU consumption by the session with SESSION_ID 44 was high enough at that time for Database Resource Management to limit its CPU usage. Well, the application owner counters, this application is very important and the resource usage should not have been constrained.
In that case, you may suspect that the session was under a consumer group that has a more restrictive CPU allocation than expected. So your next stop is to find out what consumer group was active for the session at that time—not now. Fortunately, the process for finding this information is straightforward. Active Session History records the consumer group that was active for a session at the time of sampling and displays that information in the CONSUMER_GROUP_ID column in the V$ACTIVE_SESSION_HISTORY view. You can see that information by using the query shown in Listing 4.
Code Listing 4: Listing consumer groups
select sample_time, session_state, event, consumer_group_id from v$active_session_history where user_id = 92 and sample_time between to_date('29-SEP-12 04.55.02 PM','dd-MON-yy hh:mi:ss PM') and to_date('29-SEP-12 05.05.02 PM','dd-MON-yy hh:mi:ss PM') and session_id = 44 order by 1; SESSION SAMPLE_TIME _STATE EVENT CONSUMER_GROUP_ID ————————————————————————— ——————— —————————————————— ————————————————— 29-SEP-12 04.55.34.419 PM WAITING resmgr:cpu quantum 12166 29-SEP-12 04.55.35.419 PM ON CPU 12166 29-SEP-12 04.55.36.419 PM WAITING resmgr:cpu quantum 12166 29-SEP-12 04.55.37.419 PM ON CPU 12166 29-SEP-12 04.55.38.419 PM WAITING resmgr:cpu quantum 12166 29-SEP-12 04.55.39.419 PM WAITING resmgr:cpu quantum 12166 29-SEP-12 04.55.40.419 PM ON CPU 12166 … output truncated … 29-SEP-12 04.55.37.419 PM ON CPU 12162 29-SEP-12 04.55.38.419 PM ON CPU 12166 29-SEP-12 04.55.39.419 PM ON CPU 12162 29-SEP-12 04.55.40.419 PM ON CPU 12162
Because the session may have been under different consumer groups, it’s wise to select the consumer groups for all of the sampled data in Active Session History for that session, rather than just one sample. In this case, the session was under consumer group 12166 until it changed to 12162 on 29-SEP-12 at 04.55.37.419 PM. To find out the names of the consumer groups, use the following query:
select name from v$rsrc_consumer_group where id in (12166,12162); ID NAME —————— ———————————— 12166 OTHER_GROUPS 12162 APP_GROUP
The next obvious question is why session 44 consumed so much CPU that it had to be constrained by Database Resource Management. The answer lies in the SQL statement that session 44 was executing at that time (not now). The SQL_ID column in the output of Listing 1 was fngb4y81xr57x. You can get the text of that SQL statement with the following query:
SQL> select SQL_TEXT from v$sql where sql_id = 'fngb4y81xr57x'; SQL_TEXT ——————————————————————————————————— SELECT MAX(TEST1.OWNER) FROM TEST1, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2, TEST2
In the previous sections, you saw how to find issues that occurred at a specific point in the past in Active Session History. Hopefully, the content gave you an idea of how powerful Active Session History is and how it can be used in many circumstances. Here is another example of the power and usage of Active Session History: suppose a user complains that things seemed to have been slow from a specific client machine—prolaps01—between 4:55 p.m. and 5:05 p.m. on September 29. Because Active Session History also records the machine name, you can use the query shown in Listing 5 to display all the different events experienced by all sessions from the prolaps01 machine and how often each event occurred. You can see from the Listing 5 output that during that time, the sessions connected from the prolaps01 client machine experienced locking and Resource Manager–related waits many times. With this information, you can dig further inside the V$ACTIVE_SESSION_HISTORY view to identify specific sessions and what they were doing earlier to have experienced these waits.
Code Listing 5: Checking all events from a machine
select event, count(1) from v$active_session_history where machine = 'prolaps01' and sample_time between to_date('29-SEP-12 04.55.00 PM','dd-MON-yy hh:mi:ss PM') and to_date('29-SEP-12 05.05.00 PM','dd-MON-yy hh:mi:ss PM') group by event order by event; EVENT COUNT(1) ———————————————————————————— ———————— … output truncated … db file scattered read 93 db file parallel write 127 log file parallel write 134 db file sequential read 293 control file parallel write 612 control file sequential read 948 enq: TX - row lock contention 1309 resmgr:cpu quantum 1371
You can find more information on this procedure in “NEXT STEPS.”
Active Session History Archive
Active Session History collects information on active sessions from the database instance every second. Depending on the database activity, that will lead to a lot of data collected inside the ASH buffer, but because the ASH buffer is a memory-resident structure, it has only a finite amount of space. In addition, when the instance goes down, the instance’s memory vanishes with it. Therefore, Oracle Database archives the information from the ASH buffer to a database table to make it persistent. This archived table data is visible in a view called DBA_HIST_ACTIVE_SESS_HISTORY. If you don’t find the data in the V$ACTIVE_SESSION_HISTORY view, check for it in the DBA_HIST_ACTIVE_SESS_HISTORY view, as shown in Listing 6. The output shows that the session was experiencing row lock waits—you can get the lock and row information from the DBA_HIST_ACTIVE_SESS_HISTORY view by using the query shown in Listing 7.
Code Listing 7: Getting row lock information from the Active Session History archive
select sample_time, session_state, blocking_session, owner||'.'||object_name||':'||nvl(subobject_name,'-') obj_name, dbms_ROWID.ROWID_create ( 1, o.data_object_id, current_file#, current_block#, current_row# ) row_id from dba_hist_active_sess_history s, dba_objects o where user_id = 92 and sample_time between to_date('29-SEP-12 04.55.02 PM','dd-MON-yy hh:mi:ss PM') and to_date('29-SEP-12 05.05.02 PM','dd-MON-yy hh:mi:ss PM') and event = 'enq: TX - row lock contention' and o.data_object_id = s.current_obj# order by 1,2;
Active Session History is a very powerful facility inside Oracle Database that records the information on all active sessions in the database instance in a buffer every second and exposes that data to you in a view called V$ACTIVE_SESSION_HISTORY and subsequently to a persistent table visible in the DBA_HIST_ACTIVE_SESS_HISTORY view. Active Session History offers a second-by-second record of the activities inside the session—even when that session has disconnected or ceased that activity—enabling you to go back in time and identify what was ailing a specific session at a certain point in the past.
Arup Nanda (email@example.com) has been an Oracle DBA since 1993, handling all aspects of database administration, from performance tuning to security and disaster recovery. He was Oracle Magazine’s DBA of the Year in 2003 and received an Oracle Excellence Award for Technologist of the Year in 2012.
Send us your comments