|
TECHNOLOGY: SQL
Filter Data Before You Get It
By Jonathan Gennick
Oracle Expression Filter lets you describe the data that will be important.
Oracle Database 10g introduces an intriguing new feature for continuously monitoring incoming data. Called the Oracle Expression Filter, it allows you to write what is essentially a WHERE clause to describe data of interest. You can store one or more such WHERE clauses in a text column and then have the database watch a table and alert you when rows matching any of those WHERE clause expressions become available.
Traditional SELECT processing initiates when you issue a SELECT statement and finds data that matches a WHERE clause. Oracle Expression Filter processing can initiate whenever a new row is stored in the database and goes on to find WHERE clauses that match the new data in that row. You create the WHERE clause expressions ahead of time to describe data you're watching for.
Creating an Expression Set
Imagine that you run a hospital with an on-site hematology lab. You might have some database tables such as those in Listing 1 to hold patient data (PATIENT) and blood test results (CBC_PANEL). Your doctors might review test results of noncritical patients once a day. However, some test values are so criticaleven for noncritical patientsthat a doctor must be alerted at once. Your database must let you easily change the criteria identifying such values. (Note that the blood test names in this article are fictional.)
Code Listing 1: Create tables for blood test and patient data.
CREATE TABLE cbc_panel (
/* cbc = complete blood count */
patient_id NUMBER,
test_time DATE,
hematology_analyzer_id VARCHAR2(10),
green_blood_count NUMBER,
hemotoroil NUMBER,
hematorust NUMBER,
mitichlorian_count NUMBER
);
CREATE TABLE patient (
patient_id NUMBER,
name VARCHAR2(10),
age NUMBER,
gender VARCHAR2(1),
heart_risk VARCHAR2(1),
smoker VARCHAR2(1)
);
PATIENT_ID NAME AGE GENDER HEART_RISK SMOKER
------------ ----------- ------- -------- ------------- --------
1 Jeff 102 M N Y
2 Eric 25 M N N
3 Ryan 57 M Y Y
4 Katelyn 67 F Y N
5 Karl 42 M Y N
6 Jessie 34 F N Y
7 Faith 9 F N N
8 Stephanie 18 F Y N
To use Oracle Expression Filter in this scenario, your first step is to decide which elements of a given panel of test results should elevate the results to critical status. Besides the actual blood results, you'll likely want to consider patient attributes such as age or known heart risks.
Each element, or type of value, that you want to consider is an attribute, and the entire collection of attributes used by an Oracle Expression Filter application is an attribute set. Before you can use Oracle Expression Filter, you must name each attribute, specify a datatype for each attribute, and then name the set. Listing 2 shows an easy way to do this. First, create an object type named CBC. Then, issue a call to DBMS_EXPFIL.CREATE_ATTRIBUTE_SET to create an attribute set based on that object type, and of the same name. You can now use any of the CBC attributes (HMO, RUS, AGE, and so forth) in expressions that you write for evaluation by the expression filter.
Code Listing 2: Define a set of attributes to use in expressions.
CREATE OR REPLACE TYPE cbc AS OBJECT (
machine_id VARCHAR2(10),
gbc NUMBER,
hmo NUMBER,
rus NUMBER,
mit NUMBER,
age NUMBER,
gender VARCHAR2(1),
heart_risk VARCHAR2(1),
smoker VARCHAR2(1));
/
BEGIN
DBMS_EXPFIL.CREATE_ATTRIBUTE_SET('cbc','yes');
END;
/
Next, you must create a repository for the expressions that you and your doctors write to identify test results of interest. Create a table with a text column, and link the attribute set to that column. Listing 3 creates a table named CBC_WATCH and then invokes DBMS_EXPFIL.ASSIGN_ATTRIBUTE_SET to link the CBC expression set to the column named WATCH. This creates a trigger (in the EXFSYS schema) on the CBC_WATCH table that ensures that any values inserted into the WATCH column are valid expressions involving the attributes in the CBC attribute set.
Code Listing 3: Create and populate a watch table.
CREATE TABLE cbc_watch (
watch_id NUMBER,
watch VARCHAR2(160)
);
BEGIN
DBMS_EXPFIL.ASSIGN_ATTRIBUTE_SET(
'cbc','cbc_watch','watch');
END;
/
Listing 4 inserts some watch expressions. You use the same syntax for these expressions as you would for expressions in a standard WHERE clause: Each expression consists of one or more predicates linked together with AND, OR, and NOT. Think of an expression in this context as a WHERE clause without the keyword WHERE. For example, expression No. 6 includes two predicates:
- mit > 350000
- smoker = 'N'
Code Listing 4: Populate the expression set.
INSERT INTO cbc_watch VALUES (
1, 'gbc > 11000 or gbc < 4300');
INSERT INTO cbc_watch VALUES (
2, 'NOT (hmo BETWEEN 13 AND 18) AND gender = ''M''');
INSERT INTO cbc_watch VALUES (
3, 'NOT (hmo BETWEEN 12 AND 16) AND gender = ''F''');
INSERT INTO cbc_watch VALUES (
4, '(NOT (rus BETWEEN 45 AND 62) AND gender = ''M'')
OR (NOT (rus BETWEEN 37 AND 48) AND gender = ''F'')');
INSERT INTO cbc_watch VALUES (
5, 'mit < 150000');
INSERT INTO cbc_watch VALUES (
6, 'mit > 350000 and smoker = ''N''');
INSERT INTO cbc_watch VALUES (
7, 'mit > 500000 and smoker = ''Y''');
These two predicates are linked by AND to specify that a mitichlorian count in excess of 350,000 is out of the normal range for a nonsmoker. Similarly, expression No. 7 specifies 500,000 as the high end of the normal range for smokers. Taken together, the set of values in a column such as WATCH is known as an expression set.
The WATCH_ID column in the CBC_WATCH table is a simple numeric primary key and can be used by applications to identify expressions being updated or deleted. It has no bearing on expression filter functionality. You can use whatever key structure you want for a table holding an expression set.
Evaluating Expressions
With Listing 4's expression set in place, you can issue a query such as that in Listing 5 to retrieve any abnormal test results generated on the current day. The query makes use of Oracle Database 10g's new EVALUATE function. The results show, for example, that the patient named Faith has test results corresponding to expression No. 1, indicating that her green blood cell count is abnormal. And indeed, in my example data, her green blood cell count is a mere 2,000, well under the current 4,300 acceptable minimum.
The first step in writing the query in Listing 5 is to join the PATIENT and CBC_PANEL tables. Their combined columns provide the values corresponding to the attribute set. Next, use the EVALUATE function to join to the CBC_WATCH table. EVALUATE takes two arguments:
- An expression, in this case from the WATCH column
- A set of attribute values, some or all
of which may be referenced by the expression
Passing the first argument is easy. It's simply the text column containing the expressions to be evaluated. To generate the second argument, use the constructor function CBC to create an instance of an object type corresponding to the CBC attribute set. Then invoke the
getVarchar() method to return a properly formatted string of attribute values. EVALUATE references the supplied attribute values when evaluating the expressions from the WATCH column, and it returns a 1 when a given set of attribute values causes an expression to be true. Otherwise, EVALUATE returns zero. The join condition in Listing 5 retains only those rows for which the attribute values make an expression true.
Code Listing 5: Query for abnormal test results.
SELECT cp.patient_id, p.name, cw.watch_id
FROM cbc_watch cw, cbc_panel cp, patient p
WHERE cp.patient_id = p.patient_id
AND EVALUATE(
cw.watch,
cbc(cp.hematology_analyzer_id,
cp.green_blood_count, cp.hemotoroil,
cp.hematorust, cp.mitichlorian_count,
p.age, p.gender, p.heart_risk,
p.smoker).getVarchar()
) = 1
AND cp.test_time > TRUNC(SYSDATE);
PATIENT_ID NAME WATCH_ID
------------ --------- -----------
7 Faith 1
3 Ryan 2
5 Karl 4
2 Eric 6
What Happens
Let's back up a bit and look at what's going on when you use Oracle Expression Filter. First of all, an attribute set is always linked to an object type of the same name. Listing 2 creates the type first, and then derives the attribute set from the type. It's also possible, but it involves more code, to create the attribute set directly, in which case Oracle Database creates the underlying object type for you. This underlying object type plays a key role in the validation of expressions that you enter into an expression set.
Listing 6 shows the part of the trigger on the CBC_WATCH table that validates expressions inserted into the WATCH column. Remember, the trigger in Listing 6 was created as a result of assigning the CBC attribute set to the WATCH column (in Listing 3). The SELECT in Listing 6 casts null as an instance of a type named EXF$NTT_71427. That type was defined automatically as a table of type CBC when the attribute set was created (in Listing 2). The WHERE clause for the SELECT consists of whatever text is entered into the WATCH column. When you insert text into the WATCH column, the trigger fires and attempts to parse the SELECT statement. If that text is a valid expression, the parse operation succeeds. Otherwise, the trigger rejects the expression as invalid.
Code Listing 6: Query from trigger on CBC_WATCH to validate expressions.
isvalid := exfsys.exf$expisvalid (64,'GENNICK','CBC',
'SELECT /*+ EXPR_CORR_CHECK USE_WEAK_NAME_RESL */ 1
FROM TABLE(CAST(null AS EXF$NTT_71427)) exf$_eqast
WHERE '||:new."WATCH");
Don't fret about the extra parsing that the expression validation trigger implies. Expressions must be parsed and validated somewhere. Because the database evaluates the expressions, it only makes sense for the database to do the parsing and validation. It also makes sense to leverage the already debugged parsing code in the database kernel. Finally, at least in our scenario, expressions are not likely to change often enough for parsing to be an issue.
What about when expressions are evaluated? In our scenario so far, because the expression set is not indexed, the EVALUATE function issues the recursive SELECT against DUAL, shown in Listing 7, to determine whether an expression is true for each given set of attribute values. The SELECT in Listing 7 happens to be the one issued to evaluate our expression No. 1. Such a SELECT is issued each time EVALUATE is called. Because I have seven expressions and eight sets of results in my example data, the Listing 5 query generates 56 (7 _ 8) recursive SELECT statementsa different SELECT for each expressionin order to join to the CBC_WATCH table.
Code Listing 7: Query to determine whether an expression is true.
SELECT /*+ USE_WEAK_NAME_RESL */ 1
FROM (SELECT
:1 MACHINE_ID, :2 GBC, :3 HMO, :4 RUS,
:5 MIT, :6 AGE, :7 GENDER, :8 HEART_RISK,
:9 SMOKER
from DUAL) exf$dumalias
WHERE gbc > 11000 or gbc < 4300;
Indexing Expression Sets
Your calls to EVALUATE will execute without indexes, but if you have anything more than a very small number of expressions in a set, create an expression set index. Such an index greatly reduces the need for recursive SQL.
An interesting aspect of expression set indexing is that the approach is perhaps backward from what you'd expect. The idea isn't to optimize the path from an expression to data that makes that expression true. Rather, you begin with a set of attribute values and optimize the path to find expressions that are true for that set. Figure 1 illustrates this approach. Remember, the goal is to make it feasible for the database to notify you whenever data of interest arrives. You're not looking for data. Rather, your data is looking for you.
|
| Figure 1: Oracle Expression Filter receives and checks data against expressions in an expression set.
|
Listing 8, shows one approach to creating an expression set index on the CBC_WATCH table's WATCH column. It's an approach that works well if you have a representative set of expressions already in the set, which we do. The call to DBMS_EXPFIL.GET_EXPRSET_STATS gathers and stores statistics about the expressions in the set. The query following the PL/SQL block lists attributes in the order of how frequently they occur in expressions, which is a reasonable proxy for the selectivity of predicates in the expressions involving those attributes. For example:
ATTRIBUTE_EXP PCT_OCCURRENCE
----------------- -----------------
MACHINE_ID 0
HMO 0
RUS 0
HEART_RISK 0
AGE 0
GBC 22.2222222
SMOKER 22.2222222
MIT 33.3333333
GENDER 44.4444444
Now, you may look at these results and wonder why attributes HMO and RUS are each listed as occurring zero percent of the time when, in fact, they are used in several expressions. The reason you see zeros in these results is that expression set indexes currently cannot index predicates involving combinations of NOT and BETWEEN such as you see in expressions Nos. 2 to 4. Rewriting those expressions would make it possible to index their predicates. For example, you might rewrite expression No. 2 from:
'NOT (hmo BETWEEN 13 AND 18)
AND gender = "M"'
to:
('hmo < 13 or hmo > 18) and gender = "M"'
The "index" that you get as a result from Listing 8 is actually a table of data enabling the database engine to find expressions for which an attribute set is true, while issuing as few recursive SQL queries as possible. The index table, named EXF$PTAB_71452 in my case, looks like this:
Name Type
------------------- -------------------
EXF$EXPROWID ROWID
EXF$SPARSEPRED VARCHAR2(4000)
EXF$PTATTR_1_OP NUMBER
EXF$PTATTR_1_CT VARCHAR2(1)
EXF$PTATTR_2_OP NUMBER
EXF$PTATTR_2_CT NUMBER
EXF$PTATTR_3_OP NUMBER
EXF$PTATTR_3_CT VARCHAR2(1)
EXF$PTATTR_4_OP NUMBER
EXF$PTATTR_4_CT NUMBER
Because Listing 8 specifies STOREATTRS TOP 4 when creating the index, the resulting table contains a pair of EXF$PTATTR columns for each of the four most selective attributes used in expression predicates. For example, here is the data pertaining to predicates involving the GBC attribute:
SELECT EXF$PTATTR_4_OP, EXF$PTATTR_4_CT
FROM EXF$PTAB_71452
WHERE EXF$PTATTR_4_OP IS NOT NULL;
EXF$PTATTR_4_OP EXF$PTATTR_4_CT
------------------ -----------------
2 4300
3 11000
The numbers in the OP column correspond to the less-than (<) and greater-than (>) operators. The values in the CT column represent the constants on the right-hand side of predicates such as GBC < 4300. Recall from Listing 4 that the only expression involving GBC is expression No. 1: GBC > 11000 or GBC < 4300. This expression consists of two predicates joined by OR. You can see that in the index table, each predicate joined by OR gets its own row.
There's more to the expression index created by Listing 8 than just the table. Because Listing 8 specifies INDEXATTRS TOP 3, bitmap indexes are created on the combinations of the OP and CT columns for the three most selective attributes. Predicates involving these attributes are called indexed predicates. For example, there is a bitmap index on the combination of the EXF$PTATTR_4_OP and EXF$PTATTR_4_CT columns. Indexed predicates can be evaluated entirely from their bitmap indexes. Combinations of indexed predicates are evaluated by applying AND and OR to their respective bitmap indexes.
Code Listing 8: Index an expression set
--Gather statistics
BEGIN
DBMS_EXPFIL.GET_EXPRSET_STATS (
'cbc_watch', 'watch');
END;
/
--Review the statistics
COLUMN attribute_exp format a20
SELECT attribute_exp, pct_occurrence
FROM user_expfil_exprset_stats
WHERE expr_table = 'CBC_WATCH'
AND expr_column = 'WATCH'
ORDER BY pct_occurrence, attribute_exp DESC;
--Create the index
CREATE INDEX watch_index ON cbc_watch (watch)
INDEXTYPE IS EXFSYS.EXPFILTER PARAMETERS (
'STOREATTRS TOP 4 INDEXATTRS TOP 3');
Gender is the fourth most selective attribute in our scenario, and I've chosen not to create a bitmap index for predicates involving gender. Predicates without bitmap indexes are termed stored predicates. They are still represented by pairs of OP and CT columns, and they can be evaluated solely by querying the index table.
Finally, some predicates can't be indexed at all. The combinations of NOT and BETWEEN, for example, used in expressions #2-4, can't be indexed. Instead, they're termed sparse predicates, and they're stored in the index table's EXF$SPARSEPRED column. The database engine evaluates sparse predicates by issuing SELECT statements against DUAL, such as you saw in Listing 7.
Querying with an Index
For optimum performance in Oracle Database 10g Release 1, use the ORDERED hint when you want an expression filter query to use an expression set index. You use the ORDERED hint to cause the optimizer to read the expression set table after first reading the tables containing all the attributes. In our scenario, this means that the PATIENT and CBC_PANEL tables must be joined first, followed by CBC_WATCH. Listing 9 shows the query from Listing 4, but modified so that CBC_WATCH is not a driving table in the join.
Code Listing 9: Query using an index
SELECT /*+ ORDERED */
cp.patient_id, p.name, cw.watch_id
FROM patient p, cbc_panel cp, cbc_watch cw
WHERE cp.patient_id = p.patient_id
AND EVALUATE(
cw.watch,
cbc(cp.hematology_analyzer_id,
cp.green_blood_count, cp.hemotoroil,
cp.hematorust, cp.mitichlorian_count,
p.age, p.gender, p.heart_risk,
p.smoker).getVarchar()
) = 1
AND cp.test_time > TRUNC(SYSDATE);
Here's what happens when data from CBC_PANEL and PATIENT are evaluated against the expression set in CBC_WATCH: First, the bitmap indexes are used to eliminate as many expressions as possible. Next, the stored predicates in the index table further eliminate predicates from consideration. Evaluation of sparse predicates, using recursive SELECT statements from dual, occurs last. When you choose your indexed and stored predicates well, most predicates are eliminated during the bitmap phase.
Letting Your Data Find You
You could issue the query shown in Listing 9 periodically throughout the day, or perhaps once a day. Indeed, such a batch scenario may be perfectly suitable for many applications. However, by using a trigger together with Oracle Streams Advanced Queueing, you can easily arrange for an application to be notified automatically the very moment that an aberrant test result arrives in the database. The trigger in Listing 10 does exactly that. Each time a new row of test result arrives, the trigger in Listing 10 does the following:
- Queries the patient table for the patient attributes needed to complete the CBC attribute set.
- Uses CBC.getVarchar() to generate a string with attribute values for use by EVALUATE.
- Opens a cursor on the CBC_WATCH table to find all expressions that are true for the current set of attribute values.
- Queues a message to the CBC_WARNING queue for each expression that is true.
Code Listing 10: Queue messages about aberrant results
CREATE OR REPLACE TRIGGER cbc_watch
AFTER INSERT OR UPDATE ON cbc_panel
FOR EACH ROW
DECLARE
cbc_message cbc_warning;
cbc_data VARCHAR2(300);
enqueue_options dbms_aq.enqueue_options_t;
message_properties dbms_aq.message_properties_t;
message_handle RAW(16);
patient_id NUMBER;
age NUMBER;
gender VARCHAR2(1);
heart_risk VARCHAR2(1);
smoker VARCHAR2(1);
CURSOR watch_cursor (cbc_attrs VARCHAR2) IS
SELECT cw.watch_id
FROM cbc_watch cw
WHERE EVALUATE(cw.watch, cbc_attrs) = 1;
BEGIN
--Grab patient attributes to go along
--with cbc values
SELECT p.patient_id, p.age, p.gender,
p.heart_risk, p.smoker
INTO patient_id, age, gender, heart_risk, smoker
FROM patient p
WHERE p.patient_id = :new.patient_id;
--Create the string of attribute values
cbc_data := cbc.getVarchar(:new.hematology_analyzer_id,
:new.green_blood_count, :new.hemotoroil,
:new.hematorust, :new.mitichlorian_count,
age, gender, heart_risk, smoker);
--Open a cursor, find expressions evaluating to true
FOR w IN watch_cursor (cbc_data) LOOP
cbc_message := cbc_warning(
patient_id, :new.test_time, w.watch_id);
dbms_aq.enqueue('cbc_warning_queue', enqueue_options,
message_properties, cbc_message,
message_handle);
END LOOP;
END;
/
Once the trigger is created, the application can listen on the CBC_WARNING queue, and display a message on a user's screen when notification of an out-of-bounds test result comes through.
Expression Filter Advantages
Using Oracle Expression Filter in this article's scenario has advantages over other methods that you might use:
- Flexibility. Expressions in text columns are far, far easier to change than logic embedded within triggers and stored procedures. Users can write their own expressions.
- Accuracy. The Oracle Expression Filter feature set leverages well-tested code that's already in the database engine.
- Performance. The optimizer recognizes Oracle Expression Filter functionality and can optimize expression evaluation via bitmap indexes and expression set index tables.
Oracle Expression Filter brings new possibilities to the database, making it more feasible than ever to write applications that notify users of incoming data based on criteria that they can specify and change. It's a feature well worth a look.
Jonathan Gennick (Jonathan@Gennick.com) is an experienced Oracle DBA and Oracle Certified Professional living on Michigan's Upper Peninsula. He runs the Oracle-article e-mail list, which you can learn about by visiting http://gennick.com. Gennick wrote the SQL Pocket Guide and collaborated on the Oracle Regular Expression Pocket Reference, both published by O'Reilly Media.
|