As Published In
Oracle Magazine
July/August 2013

TECHNOLOGY: Ask Tom

  

On Randomness, Syntax, and Mutation

By Tom Kyte Oracle Employee ACE

 

Our technologist gets deterministic, delimited, and the opposite of trigger-happy.

I have an UPDATE statement that returns two different results for the same data. I use a subquery in the UPDATE to identify the rows to be modified, and that SELECT statement first returns the row where ID = 14152967. But when I run the UPDATE, the wrong row is updated. Can you explain what is happening here?

[Editor’s note: Listing 1 includes the questioner’s table setup, a subquery for the UPDATE, the UPDATE, and the UPDATE result.]

Code Listing 1: The first question’s sample table, subquery, UPDATE, and result information 

SQL> CREATE TABLE TEST
  2  ( ID varchar2(10),
  3    PROD_RID varchar2(10) NULL,
  4    VERSIONS_RID varchar2(10) NULL,
  5    HOSTID varchar2(10) NULL,
  6    HIDDEN varchar2(10) NULL
  7  );
Table created.

SQL> insert into TEST
  2  (id,prod_rid,versions_rid,hostid,hidden)
  3  values(14152967, 10013252, 29300796, 44026, 0 );
1 row created.

SQL> insert into TEST
  2  (id,prod_rid,versions_rid,hostid,hidden)
  3  values(14152966, 10013252, 29300796, 44026, 0);
1 row created.

SQL> create table PRI
  2  (PROD_RID NUMBER,
  3   VERSIONS_RID NUMBER,
  4   PRI NUMBER
  5  );
Table created.

SQL> INSERT INTO PRI VALUES (10013252,29300796,13);
1 row created.

SQL> SELECT id
  2    FROM (SELECT row_number() over
  3                     (PARTITION BY hostid, prod_rid
  4                              order by pri DESC) row_n,
  5                 ID
  6            FROM (SELECT prod.ID,
  7                         prod.HOSTID,
  8                         prod.PROD_RID,
  9                         mv.PRI
 10                    FROM TEST prod
 11                    JOIN PRI mv
 12                      ON (mv.VERSIONS_RID = prod.VERSIONS_RID
 13                          AND mv.PROD_RID     = prod.PROD_RID)
 14                   WHERE HIDDEN = 0
 15                 ) x
 16          ) y
 17    WHERE y.row_n > 1;

ID
————————
14152967

SQL> UPDATE TEST
  2     SET HIDDEN = 4
  3   WHERE ID IN
  4  (
  5  <<the 17 lines of the above SELECT id… query…>> 
 22  );

1 row updated.

SQL> select id,hidden from test;

ID         HIDDEN
————————   ——————
14152967   0
14152966   4

 
The problem is that your subquery is nondeterministic. I can actually run your example and update either of the two rows in the TEST table, depending on which row I inserted first! I took your query and added a few columns to it so I could see the data values you are partitioning on and ordering by, and then I removed the WHERE clause to get rows WHERE ROW_N > 1. Listing 2 shows the results.

Code Listing 2: A close look at a nondeterministic result set 

SQL> SELECT *
  2    FROM (SELECT row_number() over
  3                     (PARTITION BY hostid, prod_rid
  4                              order by pri DESC) row_n,
  5                 ID,
  6                             hostid, prod_rid, pri
  7            FROM (SELECT prod.ID,
  8                         prod.HOSTID,
  9                         prod.PROD_RID,
 10                         mv.PRI
 11                    FROM TEST prod
 12                    JOIN PRI mv
 13                      ON (mv.VERSIONS_RID = prod.VERSIONS_RID
 14                          AND mv.PROD_RID     = prod.PROD_RID)
 15                   WHERE HIDDEN = 0
 16                 ) x
 17          ) y
 18  /

     ROW_N ID         HOSTID     PROD_RID          PRI
—————————— ————————   ——————     ————————          ———
         1 14152966   44026      10013252           13
         2 14152967   44026      10013252           13

 
In looking at that Listing 2 data, the reason for this anomaly should be clear: the result set is nondeterministic. It can and will return different answers “randomly.”

You are assigning ROW_NUMBER after partitioning by HOSTID and PROD_RID and then sorting by PRI. Because your HOSTID and PROD_RID values are the same—and the PRI value is the same—there is no deterministic way to sort this data. Either of the two rows could be first and the other second. The ROW_NUMBER assignment is not guaranteed to be the same from run to run of that query. In fact, if I truncate that table and insert the data in reverse order, I might see the following result set: 

 ROW_N ID       HOSTID PROD_RID  PRI
—————— ———————— —————— ————————  ———
     1 14152967 44026  10013252   13
     2 14152966 44026  10013252   13

 
As you can see, the ROW_N values have been switched, simply due to a switch in the order in which the rows were inserted into the table. Do not be concerned if you cannot reproduce this exactly. It will depend on the sort work area size assigned, possibly your block size, the type of segment space allocation you are using, and many other factors. In other words, it will appear random in practice.

Whenever you partition on or order by a set of attributes that are not unique when taken together, your results will be nondeterministic. The ordering of the data in a result set and the values of certain analytic windowing functions can and will vary from query execution to query execution.

In this particular case, I cannot offer the final solution to your query dilemma. You will have to decide what you need to order by to make this query deterministic. For example, if the larger values of ID are the ones you want to update, you’ll want to add ID to the ORDER BY list, as shown in Listing 3.

Code Listing 3: ID added to ORDER BY as possible deterministic result solution 

SQL> SELECT *
  2    FROM (SELECT row_number() over
  3                     (PARTITION BY hostid, prod_rid
  4                              order by pri DESC, ID ASC ) row_n,
  5                 ID,
  6                             hostid, prod_rid, pri
  7            FROM (SELECT prod.ID,
  8                         prod.HOSTID,
  9                         prod.PROD_RID,
 10                         mv.PRI
 11                    FROM TEST prod
 12                    JOIN PRI mv
 13                      ON (mv.VERSIONS_RID = prod.VERSIONS_RID
 14                          AND mv.PROD_RID     = prod.PROD_RID)
 15                   WHERE HIDDEN = 0
 16                 ) x
 17          ) y
 18  /

     ROW_N ID         HOSTID     PROD_RID          PRI
—————————— ————————   ——————     ————————          ———
         1 14152966   44026      10013252           13
         2 14152967   44026      10013252           13

 
Assuming that ID is unique in the Listing 3 result set (and I cannot say for sure, because there are no constraints at all on this example), ordering by PRI DESC, ID ASC will assure you of identical results from query execution to query execution.

I touched on this topic briefly a few years ago in “On Top-n and Pagination Queries”. This deterministic property is important whenever you are using analytic windowing functions or even just a simple ORDER BY statement.

Comma-Delimited Data from Oracle SQL Developer

I have to create a comma-delimited file with Oracle SQL Developer. I’m using this syntax to generate the output:

select spriden_id||','||
spriden_first_name||','||
spriden_last_name
from spriden
where spriden_pidm = 1012;

The company I’m sending the file to wants a header record at the top of the file that has all the field names I’m selecting. What’s the syntax for creating the header record so it looks like this:

ID,First_Name,Last_Name

You are going to be surprised at how easy this is. Oracle SQL Developer supports a few “hints” of its own design in SQL queries. (They’re not hints in the manner that INDEX, PARALLEL, FULL, USE_NL, and the like are hints.) For this result set, a simple 

select  /* csv */  spriden_id ID, 
   spriden_first_name FIRST_NAME, 
   spriden_last_name LAST_NAME
  from spriden
 where spriden_pidm = 1012;

 
will do the trick. In addition to that “hint,” there are others:

XML – tagged XML format
HTML – marked-up HTML table format
DELIMITED – same as csv
INSERT – a SQL insert format
LOADER – a pipe-delimited format suitable for SQL*Loader (sqlldr)
FIXED – fixed-width fields with trailing blanks to pad them out
TEXT – plain-text output

You can see examples of each, with more information on using them in Oracle SQL Developer, at bit.ly/10AuN46 and bit.ly/19dt2Or. The blog providing this information—ThatJeffSmith—is authored by Jeff Smith, the product manager for Oracle SQL Developer.

Mutating Tables

I have a trigger somewhat like this:

create or replace trigger TR_TABLE_X_AU
after update on TABLE_X
for each row

declare
cursor cursor_select_fk is
select FK_FOR_ANOTHER_TABLE
from TABLE_Y Y, TABLE_Z Z
where :NEW.JOINING_COL = Y.JOINING_COL
and Y.JOINING_COL = Z.JOINING_COL
and :NEW.FILTER_CONDITION_1 = Y.FILTER_CONDITION_1
and :NEW.FILTER_CONDITION_2 = Y.FILTER_CONDITION_2
and :NEW.SOME_DATE_COL = (select max(SOME_DATE_COL)
from TABLE_X
where FILTER_CONDITION_1 = :NEW.FILTER_CONDITION_1
and FILTER_CONDITION_2 = :NEW.FILTER_CONDITION_2)
begin
for rec in cursor_select_fk loop
PCK_SOME_PACKAGE.SOME_PROC
(rec.FK_FOR_ANOTHER_TABLE);
end loop;
end TR_TABLE_X_AU;

I am using triggers because it is an enhancement and I don’t want to modify the existing software. The nested query selecting the MAX date seems to be the cause of the problem. Changing the query to use SYSDATE results in no errors or exceptions. Any idea on how I can get the MAX date during the execution of the trigger for TABLE_X?

A row trigger cannot read from the table it is firing on if the triggering statement might affect more than one row (and an UPDATE can affect more than one row in general). The only type of row trigger that might be able to read the table it is firing on would be an INSERT using the VALUES clause, because it affects zero rows or one row only.

That said, this error message exists to save you from the fires of “data integrity does not exist in my database anymore.” And therefore it exists to save you from a grievous error in logic. If this trigger were able to work—if the trigger were permitted to read TABLE_X while firing—it would lead to nondeterministic outcomes from exactly the same SQL executed against exactly the same data (which is very reminiscent of the first question in this column). I’ll explore why that is in a moment.

But first I’d like to comment on “. . . I am using triggers because it is an enhancement. . .” I am not a fan of that reasoning. Software exists to be changed—that is why it is called software and not hardware. Hardware is the stuff we have to live with as is. Software is malleable, changeable. Triggers are horrible for trying to “automagically” slide a change into production. They come with so many side effects and long-term downsides. I’ve written about this before in “The Trouble with Triggers”.

Your attempt at a trigger is a classic example of why triggers can be so bad. If the trigger were allowed to work, it would leave your data in an inconsistent state—or even worse, in a different state in two different databases that contained exactly the same data to start with! For example, let’s assume that you do an UPDATE to SOME_DATE_COL. This UPDATE affects more than one row. If this trigger worked—if this trigger could read TABLE_X while the UPDATE was taking place—think about what would or could happen.

In this hypothetical situation, every time a row is updated, the trigger tries to read the “max” value, but remember: you are updating SOME_DATE_COL, so you are reading the table in the middle of the UPDATE, while the data is being updated (hypothetically, because Oracle Database doesn’t actually allow such a bad thing to happen). You will possibly read a max value that was never a max value at any point in time. You will read and process data that should never have been seen.

For example, first suppose the data in the table is 

......   01-JAN-2012 ......
.......  02-JAN-2012 ........

 
and also suppose that your UPDATE just happens to hit the rows in that order. Now assume the UPDATE you executed is 

update table_x set some_date_col = 
some_date_col + 5 .......; 

 
That will change 01-jan to 06-jan and then 02-jan into 07-jan. Now, in truth, the only max(SOME_DATE_COL) values ever in this table were 02-jan (before the UPDATE) and 07-jan (after the UPDATE). However, what will your trigger hypothetically see? It will see 01-jan get turned into 06-jan, and it will read that result and process it as if that had been the max date at some point. But it never was. Never. And then your trigger will process 07-jan as the max date for the second row.

But it will do that only if the UPDATE hits those rows in that specific order. What if the rows are processed in this order instead? 

.......  02-JAN-2012 ........
......   01-JAN-2012 ......

 

>

Next Steps


 ASK Tom
Tom Kyte answers your technology questions. Highlights from that forum appear in this column.

 FOLLOW Tom on Twitter

READ
 more Tom
 Expert Oracle Database Architecture: Oracle Database 9i, 10g, and 11g Programming Techniques and Solutions Second Edition

READ more about
 deterministic queries
trigger troubles
 The Trouble with Triggers

 bit.ly/19hg5Dp
definer’s and invoker’s rights
 bit.ly/13SzdXR

 On Becoming Others, Limits, and Restoration
 Oracle Data Pump

 DOWNLOAD Oracle Database 11g Release 2

FOLLOW Oracle Database
 on Twitter
 on Facebook

In this case, given the same exact set of data, the same data, just processed in a different order because of the way the rows existed on disk, you will get an entirely different answer! This is eerily similar to the first question in this column about the nondeterministic analytic function. The actual order of the rows on disk affects the answer!

So, Oracle Database does not permit such behavior in a trigger. Oracle Database does not allow you to program such inconsistencies into your application. Oracle Database will not allow you to query a table in the middle of modifying that table. The bugs—the strange order-dependent bugs—this would introduce into applications would be unbelievable.

I strongly (as strongly as I can) encourage you to abandon the trigger and implement the code logic you need as a straightforward code change. Remember: software exists to be modified, and that is why we call it software (and not hardware).

I hesitate to post this next link, but you could find it easily enough by searching. bit.ly/19hg5Dp shows how to postpone the processing of the modified data until an AFTER trigger, when the table is “stable.”

In Oracle Database 11g and above, you could use a compound trigger in place of the three triggers I demonstrated in this linked text, but the logic would be the same. But please give serious consideration to not using triggers for what should be a code change.

Query Works in an Anonymous Block but Not in a Procedure

My cursor query works fine in an anonymous block but fails to find any data in a PL/SQL block. The code is

declare
cursor c1 is
select object_name
from all_objects
where owner='IRIS_DATA'
and object_type='SEQUENCE';

v_string varchar2(2000);
begin
for c2 in c1 loop
v_string :=
'DROP SEQUENCE IRIS_DATA.'||c2.object_name;
execute immediate v_string;
end loop;
end;

When I execute it as an anonymous block, it works fine. When I make it a stored procedure, however, it executes but doesn’t find any sequences to process. I coded this procedure in a CLONEDEV schema and granted the privileges below as SYS to the CLONEDEV user:

GRANT DBA TO CLONEDEV;
GRANT SELECT ANY TABLE TO CLONEDEV;
GRANT ANALYZE ANY TO CLONEDEV;
GRANT DELETE ANY TABLE TO CLONEDEV;
GRANT INSERT ANY TABLE TO CLONEDEV;
GRANT CREATE ANY TABLE TO CLONEDEV;
GRANT UPDATE ANY TABLE TO CLONEDEV;
GRANT DROP ANY TABLE TO CLONEDEV;
GRANT ALTER ANY TABLE TO CLONEDEV;
GRANT ALTER ANY TRIGGER TO CLONEDEV;
GRANT DROP ANY INDEX TO CLONEDEV;
GRANT CREATE ANY INDEX TO CLONEDEV;
GRANT ALTER ANY INDEX TO CLONEDEV;
GRANT CREATE ANY SEQUENCE TO CLONEDEV;
GRANT DROP ANY SEQUENCE TO CLONEDEV;
GRANT SELECT ON ALL_TAB_COLUMNS TO CLONEDEV;
GRANT SELECT ON ALL_CONSTRAINTS TO CLONEDEV;
GRANT SELECT ON ALL_CONS_COLUMNS TO CLONEDEV;
GRANT SELECT ON ALL_OBJECTS TO CLONEDEV;
GRANT EXECUTE ON DBMS_STATS TO CLONEDEV;
GRANT EXECUTE ON DBMS_FLASHBACK TO CLONEDEV;
GRANT FLASHBACK ANY TABLE TO CLONEDEV;

I’ll refer you to bit.ly/13SzdXR, where you can see that roles are never enabled during the execution of a definer’s rights routine. By design, the ALL_OBJECTS view shows you things you can see with your current set of privileges. So if you disable roles, you’ll be able to “see” far fewer things in ALL_OBJECTS. For example   

SQL> select count(*) from all_objects;

  COUNT(*)
—————————————
     72946

SQL> set role none;
Role set.

SQL> select count(*) from all_objects;

  COUNT(*)
—————————————
     56414

 
So that is the root cause. You are not able to see all sequences, even if you can create or drop any sequence. Now, that said, I don’t like your approach, because I don’t like these ANY privileges. This is, in fact, a good case for an invoker’s rights routine. Allow CLONEDEV to “own” this code, but have IRIS_DATA run the code. Then you need no special privileges at all. Consider the code in Listing 4.

Code Listing 4: Invoker’s rights and no special privileges 

SQL> create user iris_data identified by iris_data;
User created.

SQL> grant create session, create sequence to iris_data;
Grant succeeded.

SQL> create user clonedev identified by clonedev;
User created.

SQL> grant create session, create procedure to clonedev;
Grant succeeded.

SQL> connect iris_data/iris_data;
Connected.

iris_data> create sequence iris_data.seq;
Sequence created.

iris_data> connect clonedev/clonedev;
Connected.

clonedev> create or replace procedure drop_sequences
  2  AUTHID CURRENT_USER
  3  is
  4  begin
  5      dbms_output.put_line('drop sequence started.');
  6      for x in (select 'drop sequence ' || sequence_name sql_stmt
  7                  from user_sequences )
  8      loop
  9          dbms_output.put_line( x.sql_stmt );
 10          execute immediate x.sql_stmt;
 11          dbms_output.put_line( 'success: ' || x.sql_stmt );
 12      end loop;
 13      dbms_output.put_line('drop_sequence: Dropping sequence completed.');
 14  end drop_sequences;
 15  /
Procedure created.

clonedev> grant execute on drop_sequences to iris_data;
Grant succeeded.

clonedev> connect iris_data/iris_data
Connected.

iris_data> exec clonedev.drop_sequences;
drop sequence started.
drop sequence SEQ
success: drop sequence SEQ
drop_sequence: Dropping sequence completed.

PL/SQL procedure successfully completed.

 
Do you see how very, very, very few privileges are necessary? And none of those extremely nasty ANY privileges is involved. If you are concerned about having to connect as IRIS_DATA because of passwords, that too can be solved. See the recent column “On Becoming Others, Limits, and Restoration”, for that solution. IRIS_DATA can enable clonedev to “become” IRIS_DATA by using clonedev’s password, enabling you to accomplish your goal with very few privileges needed.

Committing During an Import

During import it seems that the entire table is imported and then committed, or in the event of an error, the entire table is rolled back. If I have a large table with millions of rows, can I do something in the import so that some rows are imported? Can the import commit the rows so that I have the data as well as free up the undo?

Import (imp) does have a COMMIT parameter. It commits not after a fixed number of records, but rather after each array insert it performs, which you can control with the BUFFER parameter. I should point out, though, that if your table has LONG or LOB columns, the array size will be 1 in all cases.

But because an INSERT generates very little undo (just a delete+rowid entry in the undo), it would be uncommon for you to need to use the COMMIT parameter. Very uncommon.

Additionally, you should be using Oracle Data Pump import (impdp)—not Import (imp)—which will permit you to do a direct path insert. Direct path operations bypass undo generation, so there will be no undo to be concerned with. See Oracle Database Utilities 11g Release 2 (11.2), “Overview of Oracle Data Pump”, for details on that.

In general, using COMMIT = Y for your import would be a really bad idea. If you started importing a table and committed the first batch of rows and then it failed for any reason, you would have no way to restart your import at the point of failure. You would have to truncate the table or drop it and start over.

But in any case, use Oracle Data Pump import, and you won’t have to worry about undo generation.


Tom Kyte Headshot


Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.

 

 


Send us your comments