Use the Consolidated Database Replay feature of Oracle Real Application Testing in Oracle Database 12c to gauge the impact of consolidating workloads.
The lead DBA at Acme Bank, John, is pondering project challenges and hosting several visitors—from the CIO to the manager of applications—in his office. Acme has embarked on an ambitious and expensive project to consolidate databases to save on processing power and administration expenses. However, after two of its applications were started on its new consolidated database, Acme encountered a unique issue no one had expected. The applications each have their own distinct workload pattern, and prior to consolidation, the usage of resources such as I/O and CPU stayed under the maximum available in their respective servers. However, when the applications were moved onto the same database—as Jill, the manager of the applications, demonstrates by pointing to the chart shown in Figure 1—the timing of peak usage of one application coincided with that of the other and pushed the combined resource usage of both applications above the maximum available.
Figure 1: Load patterns of two applications
Additionally, before the consolidation, each database had its own set of tables to hold common data such as employee information and currency exchange rates. For the consolidation, however, the DBAs created just one set of common tables. With these applications running on the same database, Jill observed that rows of the common tables were locked by one application and requested by the other at the same time. These lock waits—not seen when the applications were running in their respective databases—were significantly pronounced in the consolidated database and negatively affected the performance of both applications.
These bad experiences cast serious doubts on the feasibility of the consolidation exercise, forcing Acme management to put a stop to any further efforts until the DBAs proved that the database consolidation would not be detrimental. The DBAs have captured the application workloads via the Database Replay feature of Oracle Real Application Testing from the various databases and replayed them individually against the consolidated database to see the viability of their consolidation approach, but the replays haven’t uncovered the issues that were later caused by the concurrent execution of the applications. What everyone in John’s office wants to know is if there is a way to replay the application workloads from many databases concurrently on the consolidated database.
A related question is whether there is an easy and definitive way to identify the specific databases that can successfully be consolidated into groups. For example, applications A and B may cause serious issues when colocated, but A and C may not.
There is, assures John—with the Consolidated Database Replay feature of Oracle Real Application Testing in Oracle Database 12c.
Note: To create the example environment—including the CONTEMP and CLASSIC databases and the workloads to be captured for testing—and follow along with the steps in this article, execute the setup.sql script.
connect sys/*** as sysdba create user classic identified by classic / grant create session, create table, unlimited tablespace to classic / create user contemp identified by contemp / grant create session, create table, unlimited tablespace to contemp / conn contemp/contemp create table sales ( sale_id number, product_id number, sale_dt date, amount number ) / conn classic/classic create table sales ( sale_id number, product_id number, sale_dt date, amount number ) / -- Actual “application code” is here. Store as appl.sql declare l_max_sale_id sales.sale_id%type; l_sale_id sales.sale_id%type; l_product_id sales.product_id%type; l_sale_dt sales.sale_dt%type; l_amount sales.amount%type; l_rand number; l_ctr pls_integer := 0; begin while (true) loop l_ctr := l_ctr + 1; exit when l_ctr > 100000; -- find out the action to be taken l_rand := dbms_random.value (1,100); begin case -- we need the following distribution of activities -- about 2% new sales (inserts) -- about 1% cancellation (deletes) -- about 30% order changes (updates) -- the rest 67% are order lookup (selects) when l_rand < 2 then -- new sale l_product_id := trunc(dbms_random.value(100,200)); l_sale_id := l_ctr; l_sale_dt := sysdate; l_amount := dbms_random.value(1,10000); insert into sales values ( l_sale_id, l_product_id, l_sale_dt, l_amount ); commit; l_max_sale_id := l_sale_id; when l_rand >= 2 and l_rand < 3 then -- cancellation delete sales where sale_id = l_max_sale_id; commit; when l_rand >= 3 and l_rand < 33 then -- update l_amount := dbms_random.value(1,10000); update sales set amount = l_amount where sale_id = l_max_sale_id; else -- simple lookup select amount into l_amount from sales where sale_id = l_max_sale_id; end case; exception when no_data_found then null; when OTHERS then raise; end; end loop; end; /
Acme runs two sales applications—an older one called Classic and a newer one called Contemporary (or Contemp, for short)—on two databases that bear the same names as the applications. Acme management believes that, considering the similar nature of the applications, these two databases should be merged into one. Jill chooses a new database called TGTDB to hold both applications. The issues now are (1) whether these two databases can be consolidated successfully and (2) whether any changes to TGTDB are needed for this consolidation to be successful.Jill mentions that she has already captured the workload from the CONTEMP and CLASSIC databases. (She points everyone to the Oracle Magazine January/February 2008 article “Better Workload, Better Testing” for information on how to capture the workload.) John takes over and shows everyone the steps for replaying Jill’s captured workloads from both databases in the new TGTDB database running on the tgtsvr server.
First John copies the captured workload files from the source systems to the tgtsvr server, using the scp tool, which is for remote file copying. He uses the following command to transfer the files from the CLASSIC database server:
$ scp *.wrc tgtsvr:/home/oradb/ ratclassic
For the CONTEMP database, John uses the same command but replaces
ratclassic with ratcontemp.
He creates a directory to hold all the captured workload files:
$ mkdir /home/oradb/ratcons
He copies the individual capture directories to the newly created directory:
$ cp -r /home/oradb/ratclassic/home/oradb/ratcons $ cp -r /home/oradb/ratcontemp/home/oradb/ratcons
The rest of the activities occur on the TGTDB database as the SYS user. John connects to TGTDB as SYS and creates directory objects so that the directories are recognized by the database:
SQL> create or replace directory ratcons as '/home/oradb/ratcons'; SQL> create or replace directory ratclassic as '/home/oradb/ratcons/ratclassic'; SQL> create or replace directory ratcontemp as '/home/oradb/ratcons/ratcontemp';
He preprocesses the individual workloads captured in the respective directories by issuing the following SQL statement:
begin dbms_workload_replay.process_capture('RATCLASSIC'); dbms_workload_replay.process_capture('RATCONTEMP'); end;
The consolidated replay will read the workload captured in the RATCONS directory, but John first needs to identify the repository of consolidated workload files to the database. He uses the following SQL statement:
begin dbms_workload_replay.set_replay_directory ('RATCONS'); end;
John reminds everyone that there are two subdirectories with captured workloads: RATCLASSIC, from the CLASSIC database, and RATCONTEMP, from the CONTEMP database, in the RATCONS parent directory. He sets up a consolidated workload replay schedule and marks the subdirectories that contain the workload to be used in the consolidated replay, using the following SQL statement:
var ret number begin dbms_workload_replay.begin_replay_schedule('RAT_CONS_SCHED'); :ret := dbms_workload_replay.add_capture('RATCONTEMP'); :ret := dbms_workload_replay.add_capture('RATCLASSIC'); dbms_workload_replay.end_replay_schedule; end;
John initializes the consolidated replay schedule to replay the consolidated workload, using the following SQL statement:
begin dbms_workload_replay.initialize_ consolidated_replay ('SALES_RAT_CONS_REPLAY','RAT_CONS_SCHED'); end;
John points out the connect strings that the captured workloads used in their source systems and adds that what they will use in the system where they are being replayed may not be the same, because the databases are different. The connect strings need to be updated before replay starts. First he checks the connections used in the replay by issuing the following SQL statement:
select conn_id, schedule_cap_id, capture_conn from dba_workload_connection_map;
The output is shown in Listing 1.
Code Listing 1: Checking connections of captures
CONN_ID SCHEDULE_CAP_ID CAPTURE_CONN ——————— ——————————————— ————————————————————————————————————————————— 1 1 (DESCRIPTION=(CONNECT_DATA=(SERVER=DEDICATED) (SERVICE_NAME=CLASSIC)(CID=(PROGRAM=C:\oracle\ product\12.1.0\dbhome_2\bin\sqlplus.exe) (HOST=server1)(USER=classic)))(ADDRESS= (PROTOCOL=TCP)(HOST=192.168.1.100)(PORT=1522))) 2 2 (DESCRIPTION=(CONNECT_DATA=(SERVER=DEDICATED) (SERVICE_NAME=CONTEMP)(CID=(PROGRAM=C:\oracle\ product\12.1.0\dbhome_2\bin\sqlplus.exe) (HOST=server2)(USER=contemp)))(ADDRESS= (PROTOCOL=TCP)(HOST=192.168.1.101)(PORT=1521)))
Directing everyone’s attention to the output, John explains that the connections in the workload capture used different servers (shown as values for the HOST parameter)—server1 and server2 and service names classic and contemp—and will not work in the new TGTDB database during replay. The consolidated replay sessions will need to use a new connection called TGTDB in the new system. John uses the following SQL to remap the connections to TGTDB:
begin dbms_workload_replay.remap_ connection ( connection_id => 1, schedule_cap_id=>1, replay_connection =>'TGTDB' ); dbms_workload_replay.remap_ connection ( connection_id => 2, schedule_cap_id=>2, replay_connection =>'TGTDB' ); end;
With these preparations completed, John is ready to start the consolidated replay. First he prepares the database for the workload replay by issuing the following SQL statement:
begin dbms_workload_replay.prepare_ consolidated_replay; end;
This puts the database into a state in which it expects to replay a captured workload (instead of processing regular user transactions).
From a different UNIX shell prompt, he starts the workload replay clients by using the following commands:
$ wrc system/oracle replaydir=/home/oradb/ratcons & $ wrc system/oracle replaydir=/home/oradb/ratcons &
John starts as many workload replay client (wrc) processes as the number of workloads captured earlier. In this case, he starts two processes, because he is about to replay two captured workloads. The purpose of using the & symbol at the end of the commands, John explains, is to make the processes run in the background.
While the workload replay clients are running, he starts the replay process in the database by issuing the following SQL statement:
begin dbms_workload_replay.start_ consolidated_replay; end;
That’s it, he declares. The new TGTDB database is now replaying both captured workloads—CLASSIC and CONTEMP—simultaneously. When the workloads are running, everyone turns to focus on monitoring the performance of the TGTDB database in Oracle Enterprise Manager Cloud Control.
The best way to assess the effectiveness of the target database for handling the consolidated workload, John explains, is to examine the performance of the database while the consolidated workload is replaying. Jill arranges for the DBA team to watch the target system and returns to John’s office to continue the replay activity. When the replay clients finish processing the captured workloads, the wrc processes he kicked off earlier exit gracefully. With the consolidated workload replay complete, John wants to get a comprehensive report of the replay process, and the first step to getting that report is to identify it. He gets the unique identifier of the replay activity by issuing the following SQL statement:
select max(id) from dba_workload_replays; MAX(ID) ————————— 11
He uses the ID from the result (11) in the following SQL to output the workload report:
set long 999999 spool rat_cons_report.html select dbms_workload_replay.report(11, 'HTML') from dual; spool off
John stores the output in an HTML file by using the spool command.
John opens the rat_cons_report.html HTML file in a web browser and points out a small section—shown in Figure 2—that displays some critical statistics for the captured and replayed workloads. The most important statistic is “DB Time,” which is—as the name suggests—the database time consumed by the sessions captured in the workload. Referring to the output, John points out that the source workloads consumed a combined total of 43.180 seconds in their respective databases. However, when they were both replayed on the TGTDB database simultaneously, they consumed 94.440 seconds—more than twice the original length of time.
Figure 2: Consolidated database replay report
What could be the reason for the increased consumption shown in “DB Time,” Jill wonders. Could it be that the captured workload did some extra work? To answer that, John points to the next statistic in the HTML file—“Average Active Sessions.” The 0.35 value of the captured workload is almost the same as the 0.34 value in the replay. The variance of 0.01 between capture and replay can be ignored, John explains. The third statistic (“User calls”), he continues, shows the number of SQL statements issued and is the same for capture and replay.
These statistics indicate that the workload is essentially the same during both capture and replay but that the “DB Time” consumption more than doubled during the consolidated replay. From the report, John draws the initial conclusion that the TGTDB database—as tested—will not be able to handle the consolidated workload. (Later the DBA team watching the performance of the TGTDB database while the consolidated workload was being replayed reports that the performance was terrible during the replay—buttressing John’s original assessment.)
The consolidated replay test is a landmark achievement for the Acme DBAs, especially Jill. She realizes that had she consolidated the CLASSIC and CONTEMP systems into the TGTDB database, the new system would have been sluggish and the effort would have been deemed a failure. The consolidated replay test prevented a negative outcome. Regardless of the test outcome, however, the CIO reminds everyone that these two databases must be consolidated and a way to avoid sluggish performance must be found.
To alleviate the performance problem, Jill sees two options: (1) increase the CPU capacity of the server so that the database will get enough CPU cycles to meet the consolidated database demand or (2) improve the I/O performance so that the database will not have to do as much work. The CIO warns that before she adds any system upgrades to the project tasks, she should be 100 percent confident that the system upgrade will resolve the problem and enable the two databases to be consolidated. Jill has a gut feeling that the storage upgrade alone will solve the problem, but a gut feeling is not a definitive test result, so she is reluctant to make that call. She looks to John for some reassurance.
There is no need to guess, John helpfully responds. The effect of a storage enhancement can be easily predicted. The storage team changes the underlying storage of the TGTDB database to a new system with much better response time, and John repeats the consolidated replay by executing steps 8 through 13 again. Now the same workload has been replayed twice—once each on the old and the new storage systems. John executes the following SQL statement to get the unique IDs of these two replays:
select id, status, to_char(start_time,'mm/dd/yy hh24:mi:ss') start_time from dba_workload_replays; ID STATUS START_TIME —— ————————— ————————————————— 11 COMPLETED 11/26/13 16:36:40 21 COMPLETED 11/28/13 04:54:13
The replays with IDs 11 and 21 are from before and after the storage changes, respectively. To see the impact of the storage changes, John creates a compare period report to compare the two replays:
var ret clob begin dbms_workload_replay.compare_period_report ( replay_id1 => 11, replay_id2 => 21, format => dbms_workload_capture.type_html, result => :ret); end;
John puts the results from the ret variable into an HTML file by executing the following:
set long 999999 set lines 3200 set pages 0 set trimspool on spool comp_period_rep.html print ret spool off
DOWNLOAD Oracle Database 12c
John opens the comp_period_rep.html HTML file in a web browser and points Jill to the “DB Time Distribution” section (shown in Figure 3). “Total DB Time” has been reduced from 94.440 to 30.010—a threefold improvement. The other parts of the report (not shown) confirm the improvements seen in the second replay, after the storage changes were made. This proves beyond any reasonable doubt that the storage enhancements will indeed lead to better, acceptable results.
Figure 3: The “DB Time Distribution” section of the compare period report
The CIO is happy to hear that. From his perspective, consolidating two databases shrinks the server footprint (which, in turn, decreases power, cooling, and insurance requirements) and reduces demands on sys admins and DBAs. Even taking the storage cost into account, this is a less expensive option. But the biggest benefit, the CIO proclaims, comes from stepping into the consolidation project with certainty and reduced risk, thanks to the consolidated replay.
By the way, that is just one way to solve the problem, John continues. Not all databases consume the same amount of resources. A less expensive alternative may be to consolidate a demanding database with a less demanding one. For example, the initial consolidated workload replay demonstrated that the CONTEMP and CLASSIC databases should not coexist in a database such as TGTDB running on its original storage system. But it may be possible to consolidate CLASSIC and a less demanding database into TGTDB without changing the storage system. To determine which combinations will work, John explains, all Jill needs to do is repeat the earlier steps with different source database workloads and analyze the results in another compare period report. A source workload needs to be captured only once; it can be replayed several times. And, John reminds everyone, this feature can replay any captured workload; there is no special capture for consolidated replays. Jill could not be happier.
In the course of database consolidation, one of the big challenges for Jill and the management at Acme was to predict the impact of workloads from several databases running simultaneously on one database. Using the Consolidated Database Replay feature of Oracle Real Application Testing, John showed how Jill can accurately measure that impact. The consolidated replay testing techniques John demonstrated can be used for all types of consolidation scenarios: relocating various databases into a single database as different schemas, creating various databases as pluggable databases in a multitenant container database under Oracle Database 12c’s multitenant architecture, creating separate databases in different virtual machines, or any combination of these techniques. The goal in any case is the same: accurate prediction of the impact of consolidation. And the Consolidated Database Replay feature of Oracle Real Application Testing enables exactly that. (Consolidated replay is also available in the Oracle Enterprise Manager Cloud Control 12c interface.)
Arup Nanda (firstname.lastname@example.org) has been an Oracle DBA since 1993, handling all aspects of database administration, from performance tuning to security and disaster recovery. He was Oracle Magazine’s DBA of the Year in 2003 and received an Oracle Excellence Award for Technologist of the Year in 2012.