As Published In
Oracle Magazine
July/August 2008

TECHNOLOGY: Total Recall


Managing History

By Jonathan Gennick

Oracle Total Recall enables users to find, track, and use past and present information.

What did you know, and when? U.S. government regulations such as the Sarbanes-Oxley Act and the Health Insurance Portability and Accountability Act (HIPAA) or international agreements such as Basel II increasingly drive the need to track changes to data over long periods of time. All business and service organizations are increasingly held accountable for maintaining a secure history of data on which key decisions—sometimes life-or-death decisions—are based. Oracle Total Recall, an option to Oracle Database 11g Enterprise Edition, makes creating and maintaining a historical record of change over time as simple as flipping on the light switch when you enter a room.

The Need for Reliable Data

Imagine for a moment that you're the medical director for an ambulance service. Part of your job is to review and sign off on drug test results for new hires. The drug test results for a new paramedic come back, and everything looks good. You clear him to work.

On his first day on the job, he runs the ambulance through a red light at 90 miles per hour and collides with two other vehicles. The heart attack patient in the ambulance now has several broken bones as a consequence. The other two drivers are hurt. The new paramedic is tested at the hospital for drugs, and the results are positive.

A variety of legal action —including lawsuits —is coming.

You check the initial drug test results of your new hire again. This time they are positive for several common street drugs. You know that the results for these tests were negative when you cleared the paramedic, but the current results don't—and can't—show what you saw when you cleared the paramedic to work.

You review your malpractice coverage and start updating your résumé.

The Need for History

Let's rewrite the information technology ending of this story: After the accident, you check the initial drug test results of your new hire. They are positive for several common street drugs. You appear to have cleared a drug user to drive the ambulance, but you remember that all of the tests were negative when you cleared this paramedic to work. What went wrong?

You click the Audit Changes button on the bottom of your screen. You see a history indicating that the drug test results were changed after you made your report on the new hire. Further investigation shows that the drug lab had reported incorrect results and later corrected those results in one of its regular electronic feeds rather than notifying you by phone and registered letter, as your service contract with the lab requires.

Oracle Total Recall: Totally Easy

These fictional events are dramatic, but they illustrate the importance of having a reliable history of change to critical data. Of course, you could develop and implement a homegrown solution for maintaining historical records, but you could instead use Oracle Total Recall and focus on the business problem, not the technical implementation.

Using Oracle Total Recall, you can create a reliable history of change in a few easy steps:

1. Identify related tables to track.
2. Identify and size tablespaces to hold historical data.
3. Create an archive with a defined retention policy.
4. Enable change history tracking for the tables.

Listing 1 shows the beginning of the code that will create and populate a historical record for a simplified medical test schema. To create and populate this schema, run the med_test_schema.sql script from the sample code for this article.

Code Listing 1: Creating a simplified test result schema

 

CONNECT / AS SYSDBA

CREATE TABLESPACE med_archive_1
  DATAFILE '/u01/app/oracle/oradata/orcl/med_archive_1_1.dbf'
  SIZE 4M SEGMENT SPACE MANAGEMENT AUTO;

CREATE TABLESPACE med_archive_2
  DATAFILE '/u01/app/oracle/oradata/orcl/med_archive_2_1.dbf'
  SIZE 4M SEGMENT SPACE MANAGEMENT AUTO;

CREATE USER med_test IDENTIFIED BY secret
  DEFAULT TABLESPACE users
  QUOTA UNLIMITED ON users
  QUOTA UNLIMITED ON med_archive_1
  QUOTA UNLIMITED ON med_archive_2;

GRANT CONNECT, RESOURCE TO med_test;

CONNECT med_test/secret

CREATE TABLE test_name (
  test_code VARCHAR2(4),
  test_name VARCHAR2(30),
  CONSTRAINT test_name_pk
  PRIMARY KEY (test_code)
);

CREATE TABLE test_result (
  patient_id NUMBER,
  test_time TIMESTAMP,
  test_code VARCHAR2(4),
  test_result VARCHAR2(10),

CONSTRAINT test_result_pk
  PRIMARY KEY (patient_id, test_time, test_code),
CONSTRAINT test_code_fk
  FOREIGN KEY (test_code)
  REFERENCES test_name
);


Your next step in implementing Oracle Total Recall is to create a flashback data archive —a logical storage container for historical data that you can spread over one or more tablespaces.

Connect to your instance as a user holding the FLASHBACK ARCHIVE ADMINISTER privilege (the SYSTEM user holds that privilege by default). Next, issue the following command:

 

CREATE FLASHBACK ARCHIVE medical_tests
TABLESPACE med_archive_1 QUOTA 3M
RETENTION 11 MONTH;


When you first create a flashback data archive, specify a retention time. You can specify only one tablespace in the CREATE command, but you can issue subsequent ALTER commands to spread an archive across as many tablespaces as you need. (Use only tablespaces configured for automatic segment-space management.) The same retention time applies across all tablespaces in an archive. For example, to spread your archive over a second tablespace, issue the following ALTER command:

 

ALTER FLASHBACK ARCHIVE medical_tests
ADD TABLESPACE med_archive_2
QUOTA 3M;


You now have a flashback data archive named MEDICAL_TESTS. The archive will use up to 3MB from each of the two designated tablespaces (MED_ARCHIVE_1 and MED_ARCHIVE_2). Based on the value you provided for the RETENTION parameter, the archive will automatically purge historical data as it ages beyond 11 months.

Best practice is to create a separate flashback data archive for each application module you archive. Don't combine unrelated tables from, say, Payroll and Accounts Payable into one archive, just because their retention times happen to be the same today. Requirements might diverge in the future.

A last bit of housekeeping: grant your schema owner (MED_TEST) access to the archive:

 

GRANT FLASHBACK ARCHIVE
ON medical_tests
TO MED_TEST;


User MED_TEST now has permission to write historical data into the MEDICAL_TESTS flashback data archive.

Tracking Changes

Now that you have a container to hold the historical information, your next step is to begin recording changes to the data in the tables of interest. You can archive changes to one table or to a related group of tables—one simple command per table is all you need. For example, connect as the schema owner and issue these commands to begin tracking changes to data in the TEST_NAME and TEST_RESULT tables:

 

ALTER TABLE test_name
   FLASHBACK ARCHIVE medical_tests;

ALTER TABLE test_result
   FLASHBACK ARCHIVE medical_tests;


From this point forward, all changes to rows in the two tables in the MED_TEST schema are tracked. That historical record of change over time is maintained in the MEDICAL_TESTS flashback data archive, which is spread across the MED_ARCHIVE_1 and MED_ARCHIVE_2 tablespaces.

If you're following along with this article's example, you can make the same changes that I make by executing the SQL statements in Listing 2. Note, however, that the specific dates and times in the record of changes you see in this article won't match what you create and see on your own system. Additionally, because the queries I execute in subsequent listings are date- and time-specific, they will not work as written on your system.

Code Listing 2: Changing the test results after the fact

 

UPDATE test_result 
   SET test_result = 'Positive'
WHERE patient_id = 101 AND test_code = 'coca'
AND test_time = TIMESTAMP '2008-02-01 13:17:00';

UPDATE test_result 
   SET test_result = 'Positive'
WHERE patient_id = 101 AND test_code = 'ampa'
AND test_time = TIMESTAMP '2008-02-01 13:17:00';

UPDATE test_result 
   SET test_result = 'Positive'
WHERE patient_id = 101 AND test_code = 'phen'
AND test_time = TIMESTAMP '2008-02-01 13:17:00';


Oracle Database maintains the history in a constant, read-only state. Users, including the schema owner and the database administrator, can query the history but cannot alter it. In addition, Oracle Database prevents any modifications to the tables that would invalidate the history. For example, you cannot drop a column. However, you can add columns as needed.

Point-in-Time Queries

The schema in Listing 1 highlights an interesting issue with time-based data. A physician might ask, What are the patient's test results on or before February 3, 2008?

A similar, but not identical, question is What are the patient's test results that I could have seen as of February 3, 2008?

The difference between the two questions is subtle but critical. Question 1 requests an answer about past information, based on the best information currently available . Question 2 requests an answer based on information that was available at some point in the past . The difference between questions 1 and 2 is the difference between "what was" and "what was known" as of a given date.

Listing 3 shows one solution to the first question. The WHERE clause within the subquery ensures that only results from before the beginning of the February 3 workday are considered. The WHERE clause in the outer query ensures that only the most recent of those results is reported. Several of the reported results are positive.

Code Listing 3: Querying for the results for a given time

 

SELECT patient_id, test_time, test_name, test_result
FROM (
   SELECT patient_id, test_time, test_name, test_result,
          MAX(test_time) OVER 
             (PARTITION BY patient_id) test_max_time
   FROM test_result tr INNER JOIN test_name tn
     ON tr.test_code = tn.test_code
   WHERE test_time <= TIMESTAMP '2008-02-03 08:00:00'
   )
WHERE patient_id = 101
   AND test_time = test_max_time;


PATIENT_ID      TEST_TIME                       TEST_NAME         TEST_RESULT
------          -------------------             --------          -------
101             01-FEB-08 01.17.00.000000 PM    Cocaine           Positive
101             01-FEB-08 01.17.00.000000 PM    Marijuana         Negative
101             01-FEB-08 01.17.00.000000 PM    Amphetamines      Positive
101             01-FEB-08 01.17.00.000000 PM    Phencyclidine     Positive


Listing 4 adds AS OF clauses to the query from Listing 3 to answer the second question. Those clauses direct the query to reach into the flashback data archive and give the result that the physician would have received at 8:00 a.m. on the day in question (February 3), when all results are negative. Question 2 is exactly the sort of question Oracle Total Recall was designed to answer. Oracle Total Recall is all about being able to prove what data you had and when you had it.

Code Listing 4: Querying for the results as of a given time

 

SELECT patient_id, test_time, test_name, test_result
FROM (
   SELECT patient_id, test_time, test_name, test_result,
          MAX(test_time) OVER 
            (PARTITION BY patient_id) test_max_time
   FROM 
      test_result AS OF TIMESTAMP (TIMESTAMP '2008-02-03 08:00:00') tr 
      INNER JOIN 
      test_name AS OF TIMESTAMP (TIMESTAMP '2008-02-03 08:00:00') tn
      ON tr.test_code = tn.test_code
   WHERE test_time <= TIMESTAMP '2008-02-03 08:00:00'
   )
WHERE patient_id = 101
  AND test_time = test_max_time;


PATIENT_ID      TEST_TIME                       TEST_NAME           TEST_RESULT
-------         -----------------               --------            ---------
101             01-FEB-08 01.17.00.000000 PM    Cocaine             Negative
101             01-FEB-08 01.17.00.000000 PM    Amphetamines        Negative
101             01-FEB-08 01.17.00.000000 PM    Phencyclidine       Negative
101             01-FEB-08 01.17.00.000000 PM    Marijuana           Negative


When you go back in time with a flashback query, be certain that the time stamp or system change number (SCN) you specify is valid for the table and the archive in question. You'll receive an error message if you attempt to reach farther back into the past than can be resolved by either the archive or the current undo tablespace. You cannot query AS OF about a point in time before archiving was enabled on the table you are querying.

Under the Hood

Oracle Total Recall implements data collection and storage efficiently. A new flashback data archiver process mines undo data that the instance is already generating. There's no extra overhead affecting transactions against tables being archived. If you have the CPU capacity to run the archiver process and the input/output capacity to write the archival data to disk, your transaction throughput should be unaffected by flashback data archiving.

The archiver process collects the data, compresses it, deduplicates it, writes it into the flashback data archives, and exposes it in a standard SQL table. That table is protected from tampering—only the archiver process can write to it.

Query the USER_FLASHBACK_ARCHIVE_TABLES view to find the underlying tables used to store archival data. For example, here's how to find the archive table in which TEST_RESULT archive data is stored:

 

SELECT archive_table_name
FROM user_flashback_archive_tables
WHERE table_name = 'TEST_RESULT';

ARCHIVE_TABLE_NAME
-------------------
SYS_FBA_HIST_71101


You can create indexes on the table to improve the performance of frequently executed queries against archival data. For example, if you frequently query the historical record by patient and test time, you can create an index on the archive table such as this:

 

CREATE INDEX patient_and_time
ON med_test.sys_fba_hist_71101 
(patient_id, test_time);


You can also query the underlying archive table directly. Listing 5 queries the archive table associated with the TEST_RESULT table. The archive table records SCNs, not time stamps, so the query in Listing 5 puts the SCN_TO_TIMESTAMP function to good use by converting ENDSCN to an ending time stamp. The results show that three records for the specified PATIENT_ID and TEST_TIME showed negative results through 1:55:32 a.m. on February 5. Someone then changed those three records, causing their original values to be written into the archive. (The operation values are all null, because the original records were already present when the flashback data archive was created). Comparing the three archived records with their current counterparts in the TEST_RESULT table enables you to see what changes were made that morning.

Code Listing 5: Querying the underlying archive table

 

SELECT test_code, test_result, operation, 
       SCN_TO_TIMESTAMP(endscn) end_timestamp
FROM SYS_FBA_HIST_71101
WHERE patient_id = 101
AND test_time = TIMESTAMP '2008-02-01 13:17:00';

TEST_CODE         TEST_RESULT           OPERATION      END_TIMESTAMP
------------      ---------             ------         ---------
coca              Negative              05-FEB-08      01.55.32.000000000 AM
                                                                    
ampa              Negative              05-FEB-08      01.55.32.000000000 AM
                                                                    
phen              Negative              05-FEB-08      01.55.32.000000000 AM
                                                                    


A cautionary note: As a best practice, you're generally better off issuing flashback version queries rather than querying archive tables directly. However, those tables are there, and they are accessible, should you need them.

Advantages of Oracle Total Recall

Next Steps



 COMPLETE the "Using Total Recall" (Oracle by Example) tutorial

READ more about
Oracle Total Recall/ flashback data archive
Oracle Database Advanced Application Developer's Guide

DOWNLOAD the sample code for this article

Oracle Total Recall yields many benefits over traditional application- and trigger-based archival methods. First, archival data is tamper-proof. No database users, not even privileged users, can change the historical record. This level of security is increasingly demanded by government regulation.

No programming effort is required. There are no archive tables to design, no code to write, and no triggers to debug. The result has far less risk that anyone will inadvertently damage the archive.

Performance impact is minimal. There is no direct impact on transaction throughput, because the processes involved in archiving run asynchronously.

Oracle Total Recall is implemented natively in Oracle Database and is application-transparent. You can implement it underneath a running application without making changes to existing code.

Oracle Total Recall lends itself to many applications. It's worth considering anytime you need to access past data, either routinely or spontaneously. Oracle Total Recall can be used to collect historical data for routine data mining and business intelligence applications. It can be used as an extended form of Oracle Flashback for data entry error correction. Another creative use is as an archival solution in which older data is routinely deleted from operational tables, with that data still being accessible from a flashback data archive.

The bottom line: Oracle Total Recall provides a single, secure approach to creating a historical record of important business data that can be applied across many applications and toward many goals. It's history at the flip of a switch. 



Jonathan Gennick
(www.gennick.com) is an experienced Oracle professional and a member of the Oak Table Network. He wrote the best-selling SQL Pocket Guide and the Oracle SQL*Plus Pocket Reference , both from O'Reilly Media.

Send us your comments