TECHNOLOGY: Total Recall
Managing HistoryBy Jonathan Gennick
Oracle Total Recall enables users to find, track, and use past and present information.
What did you know, and when? U.S. government regulations such as the Sarbanes-Oxley Act and the Health Insurance Portability and Accountability Act (HIPAA) or international agreements such as Basel II increasingly drive the need to track changes to data over long periods of time. All business and service organizations are increasingly held accountable for maintaining a secure history of data on which key decisionssometimes life-or-death decisionsare based. Oracle Total Recall, an option to Oracle Database 11g Enterprise Edition, makes creating and maintaining a historical record of change over time as simple as flipping on the light switch when you enter a room.
The Need for Reliable Data
Imagine for a moment that you're the medical director for an ambulance service. Part of your job is to review and sign off on drug test results for new hires. The drug test results for a new paramedic come back, and everything looks good. You clear him to work.
On his first day on the job, he runs the ambulance through a red light at 90 miles per hour and collides with two other vehicles. The heart attack patient in the ambulance now has several broken bones as a consequence. The other two drivers are hurt. The new paramedic is tested at the hospital for drugs, and the results are positive.
A variety of legal action including lawsuits is coming.
You check the initial drug test results of your new hire again. This time they are positive for several common street drugs. You know that the results for these tests were negative when you cleared the paramedic, but the current results don'tand can'tshow what you saw when you cleared the paramedic to work.
You review your malpractice coverage and start updating your résumé.
The Need for History
Let's rewrite the information technology ending of this story: After the accident, you check the initial drug test results of your new hire. They are positive for several common street drugs. You appear to have cleared a drug user to drive the ambulance, but you remember that all of the tests were negative when you cleared this paramedic to work. What went wrong?
You click the Audit Changes button on the bottom of your screen. You see a history indicating that the drug test results were changed after you made your report on the new hire. Further investigation shows that the drug lab had reported incorrect results and later corrected those results in one of its regular electronic feeds rather than notifying you by phone and registered letter, as your service contract with the lab requires.
Oracle Total Recall: Totally Easy
These fictional events are dramatic, but they illustrate the importance of having a reliable history of change to critical data. Of course, you could develop and implement a homegrown solution for maintaining historical records, but you could instead use Oracle Total Recall and focus on the business problem, not the technical implementation.
Using Oracle Total Recall, you can create a reliable history of change in a few easy steps:
1. Identify related tables to track.
Listing 1 shows the beginning of the code that will create and populate a historical record for a simplified medical test schema. To create and populate this schema, run the med_test_schema.sql script from the sample code for this article.
Code Listing 1: Creating a simplified test result schema
CONNECT / AS SYSDBA CREATE TABLESPACE med_archive_1 DATAFILE '/u01/app/oracle/oradata/orcl/med_archive_1_1.dbf' SIZE 4M SEGMENT SPACE MANAGEMENT AUTO; CREATE TABLESPACE med_archive_2 DATAFILE '/u01/app/oracle/oradata/orcl/med_archive_2_1.dbf' SIZE 4M SEGMENT SPACE MANAGEMENT AUTO; CREATE USER med_test IDENTIFIED BY secret DEFAULT TABLESPACE users QUOTA UNLIMITED ON users QUOTA UNLIMITED ON med_archive_1 QUOTA UNLIMITED ON med_archive_2; GRANT CONNECT, RESOURCE TO med_test; CONNECT med_test/secret CREATE TABLE test_name ( test_code VARCHAR2(4), test_name VARCHAR2(30), CONSTRAINT test_name_pk PRIMARY KEY (test_code) ); CREATE TABLE test_result ( patient_id NUMBER, test_time TIMESTAMP, test_code VARCHAR2(4), test_result VARCHAR2(10), CONSTRAINT test_result_pk PRIMARY KEY (patient_id, test_time, test_code), CONSTRAINT test_code_fk FOREIGN KEY (test_code) REFERENCES test_name );
Your next step in implementing Oracle Total Recall is to create a flashback data archive a logical storage container for historical data that you can spread over one or more tablespaces.
Connect to your instance as a user holding the FLASHBACK ARCHIVE ADMINISTER privilege (the SYSTEM user holds that privilege by default). Next, issue the following command:
CREATE FLASHBACK ARCHIVE medical_tests TABLESPACE med_archive_1 QUOTA 3M RETENTION 11 MONTH;
When you first create a flashback data archive, specify a retention time. You can specify only one tablespace in the CREATE command, but you can issue subsequent ALTER commands to spread an archive across as many tablespaces as you need. (Use only tablespaces configured for automatic segment-space management.) The same retention time applies across all tablespaces in an archive. For example, to spread your archive over a second tablespace, issue the following ALTER command:
ALTER FLASHBACK ARCHIVE medical_tests ADD TABLESPACE med_archive_2 QUOTA 3M;
You now have a flashback data archive named MEDICAL_TESTS. The archive will use up to 3MB from each of the two designated tablespaces (MED_ARCHIVE_1 and MED_ARCHIVE_2). Based on the value you provided for the RETENTION parameter, the archive will automatically purge historical data as it ages beyond 11 months.
Best practice is to create a separate flashback data archive for each application module you archive. Don't combine unrelated tables from, say, Payroll and Accounts Payable into one archive, just because their retention times happen to be the same today. Requirements might diverge in the future.
A last bit of housekeeping: grant your schema owner (MED_TEST) access to the archive:
GRANT FLASHBACK ARCHIVE ON medical_tests TO MED_TEST;
User MED_TEST now has permission to write historical data into the MEDICAL_TESTS flashback data archive.
Now that you have a container to hold the historical information, your next step is to begin recording changes to the data in the tables of interest. You can archive changes to one table or to a related group of tablesone simple command per table is all you need. For example, connect as the schema owner and issue these commands to begin tracking changes to data in the TEST_NAME and TEST_RESULT tables:
ALTER TABLE test_name FLASHBACK ARCHIVE medical_tests; ALTER TABLE test_result FLASHBACK ARCHIVE medical_tests;
From this point forward, all changes to rows in the two tables in the MED_TEST schema are tracked. That historical record of change over time is maintained in the MEDICAL_TESTS flashback data archive, which is spread across the MED_ARCHIVE_1 and MED_ARCHIVE_2 tablespaces.
If you're following along with this article's example, you can make the same changes that I make by executing the SQL statements in Listing 2. Note, however, that the specific dates and times in the record of changes you see in this article won't match what you create and see on your own system. Additionally, because the queries I execute in subsequent listings are date- and time-specific, they will not work as written on your system.
Code Listing 2: Changing the test results after the fact
UPDATE test_result SET test_result = 'Positive' WHERE patient_id = 101 AND test_code = 'coca' AND test_time = TIMESTAMP '2008-02-01 13:17:00'; UPDATE test_result SET test_result = 'Positive' WHERE patient_id = 101 AND test_code = 'ampa' AND test_time = TIMESTAMP '2008-02-01 13:17:00'; UPDATE test_result SET test_result = 'Positive' WHERE patient_id = 101 AND test_code = 'phen' AND test_time = TIMESTAMP '2008-02-01 13:17:00';
Oracle Database maintains the history in a constant, read-only state. Users, including the schema owner and the database administrator, can query the history but cannot alter it. In addition, Oracle Database prevents any modifications to the tables that would invalidate the history. For example, you cannot drop a column. However, you can add columns as needed.
The schema in Listing 1 highlights an interesting issue with time-based data. A physician might ask, What are the patient's test results on or before February 3, 2008?
A similar, but not identical, question is What are the patient's test results that I could have seen as of February 3, 2008?
The difference between the two questions is subtle but critical. Question 1 requests an answer about past information, based on the best information currently available . Question 2 requests an answer based on information that was available at some point in the past . The difference between questions 1 and 2 is the difference between "what was" and "what was known" as of a given date.
Listing 3 shows one solution to the first question. The WHERE clause within the subquery ensures that only results from before the beginning of the February 3 workday are considered. The WHERE clause in the outer query ensures that only the most recent of those results is reported. Several of the reported results are positive.
Code Listing 3: Querying for the results for a given time
SELECT patient_id, test_time, test_name, test_result FROM ( SELECT patient_id, test_time, test_name, test_result, MAX(test_time) OVER (PARTITION BY patient_id) test_max_time FROM test_result tr INNER JOIN test_name tn ON tr.test_code = tn.test_code WHERE test_time <= TIMESTAMP '2008-02-03 08:00:00' ) WHERE patient_id = 101 AND test_time = test_max_time; PATIENT_ID TEST_TIME TEST_NAME TEST_RESULT ------ ------------------- -------- ------- 101 01-FEB-08 01.17.00.000000 PM Cocaine Positive 101 01-FEB-08 01.17.00.000000 PM Marijuana Negative 101 01-FEB-08 01.17.00.000000 PM Amphetamines Positive 101 01-FEB-08 01.17.00.000000 PM Phencyclidine Positive
Listing 4 adds AS OF clauses to the query from Listing 3 to answer the second question. Those clauses direct the query to reach into the flashback data archive and give the result that the physician would have received at 8:00 a.m. on the day in question (February 3), when all results are negative. Question 2 is exactly the sort of question Oracle Total Recall was designed to answer. Oracle Total Recall is all about being able to prove what data you had and when you had it.
Code Listing 4: Querying for the results as of a given time
SELECT patient_id, test_time, test_name, test_result FROM ( SELECT patient_id, test_time, test_name, test_result, MAX(test_time) OVER (PARTITION BY patient_id) test_max_time FROM test_result AS OF TIMESTAMP (TIMESTAMP '2008-02-03 08:00:00') tr INNER JOIN test_name AS OF TIMESTAMP (TIMESTAMP '2008-02-03 08:00:00') tn ON tr.test_code = tn.test_code WHERE test_time <= TIMESTAMP '2008-02-03 08:00:00' ) WHERE patient_id = 101 AND test_time = test_max_time; PATIENT_ID TEST_TIME TEST_NAME TEST_RESULT ------- ----------------- -------- --------- 101 01-FEB-08 01.17.00.000000 PM Cocaine Negative 101 01-FEB-08 01.17.00.000000 PM Amphetamines Negative 101 01-FEB-08 01.17.00.000000 PM Phencyclidine Negative 101 01-FEB-08 01.17.00.000000 PM Marijuana Negative
When you go back in time with a flashback query, be certain that the time stamp or system change number (SCN) you specify is valid for the table and the archive in question. You'll receive an error message if you attempt to reach farther back into the past than can be resolved by either the archive or the current undo tablespace. You cannot query AS OF about a point in time before archiving was enabled on the table you are querying.
Under the Hood
Oracle Total Recall implements data collection and storage efficiently. A new flashback data archiver process mines undo data that the instance is already generating. There's no extra overhead affecting transactions against tables being archived. If you have the CPU capacity to run the archiver process and the input/output capacity to write the archival data to disk, your transaction throughput should be unaffected by flashback data archiving.
The archiver process collects the data, compresses it, deduplicates it, writes it into the flashback data archives, and exposes it in a standard SQL table. That table is protected from tamperingonly the archiver process can write to it.
Query the USER_FLASHBACK_ARCHIVE_TABLES view to find the underlying tables used to store archival data. For example, here's how to find the archive table in which TEST_RESULT archive data is stored:
SELECT archive_table_name FROM user_flashback_archive_tables WHERE table_name = 'TEST_RESULT'; ARCHIVE_TABLE_NAME ------------------- SYS_FBA_HIST_71101
You can create indexes on the table to improve the performance of frequently executed queries against archival data. For example, if you frequently query the historical record by patient and test time, you can create an index on the archive table such as this:
CREATE INDEX patient_and_time ON med_test.sys_fba_hist_71101 (patient_id, test_time);
You can also query the underlying archive table directly. Listing 5 queries the archive table associated with the TEST_RESULT table. The archive table records SCNs, not time stamps, so the query in Listing 5 puts the SCN_TO_TIMESTAMP function to good use by converting ENDSCN to an ending time stamp. The results show that three records for the specified PATIENT_ID and TEST_TIME showed negative results through 1:55:32 a.m. on February 5. Someone then changed those three records, causing their original values to be written into the archive. (The operation values are all null, because the original records were already present when the flashback data archive was created). Comparing the three archived records with their current counterparts in the TEST_RESULT table enables you to see what changes were made that morning.
Code Listing 5: Querying the underlying archive table
SELECT test_code, test_result, operation, SCN_TO_TIMESTAMP(endscn) end_timestamp FROM SYS_FBA_HIST_71101 WHERE patient_id = 101 AND test_time = TIMESTAMP '2008-02-01 13:17:00'; TEST_CODE TEST_RESULT OPERATION END_TIMESTAMP ------------ --------- ------ --------- coca Negative 05-FEB-08 01.55.32.000000000 AM ampa Negative 05-FEB-08 01.55.32.000000000 AM phen Negative 05-FEB-08 01.55.32.000000000 AM
A cautionary note: As a best practice, you're generally better off issuing flashback version queries rather than querying archive tables directly. However, those tables are there, and they are accessible, should you need them.
Advantages of Oracle Total Recall
Oracle Total Recall yields many benefits over traditional application- and trigger-based archival methods. First, archival data is tamper-proof. No database users, not even privileged users, can change the historical record. This level of security is increasingly demanded by government regulation.
No programming effort is required. There are no archive tables to design, no code to write, and no triggers to debug. The result has far less risk that anyone will inadvertently damage the archive.
Performance impact is minimal. There is no direct impact on transaction throughput, because the processes involved in archiving run asynchronously.
Oracle Total Recall is implemented natively in Oracle Database and is application-transparent. You can implement it underneath a running application without making changes to existing code.
Oracle Total Recall lends itself to many applications. It's worth considering anytime you need to access past data, either routinely or spontaneously. Oracle Total Recall can be used to collect historical data for routine data mining and business intelligence applications. It can be used as an extended form of Oracle Flashback for data entry error correction. Another creative use is as an archival solution in which older data is routinely deleted from operational tables, with that data still being accessible from a flashback data archive.
The bottom line: Oracle Total Recall provides a single, secure approach to creating a historical record of important business data that can be applied across many applications and toward many goals. It's history at the flip of a switch.
Jonathan Gennick (www.gennick.com) is an experienced Oracle professional and a member of the Oak Table Network. He wrote the best-selling SQL Pocket Guide and the Oracle SQL*Plus Pocket Reference , both from O'Reilly Media.