TECHNOLOGY: Ask Tom
On Deduping and Ordering
By Tom Kyte
Our technologist deletes duplicates, gives trigger orders, and explains how SELECT locks.
Please help me write a SQL query that will get rid of the duplicates, which are marked as XXX below. Duplicate records to me are records that have the same SA and SB values, regardless of their position. (The primary key is the combined columns, SA + SB.)
SA SB --- --- A B A C A D B A XXX (DUPLICATE ) B C C A XXX (DUPLICATE) C B XXX (DUPLICATE) C D
There are a couple of ways to do this—and I'll demonstrate my approach—but for the future, I suggest that you implement an integrity constraint in the database to disallow further duplicates. For now I can use some built-in functions and analytics to dedupe the table.
Using analytics, what I'll do is partition—break up—the data by LEAST(SA,SB), GREATEST(SA,SB). That is, I'll put the lesser of SA and SB first and the greater of SA and SB second, making a row that has the values A,B be in the same partition as a row with the values B,A. Once I break up the data by the LEAST(SA,SB), GREATEST(SA,SB), I'll sort it by A,B and assign the row_number() function to each row. All I have to do then is find any row with row_number() <> 1 (not the first row in the partition—I want only one row from each partition) and delete it.
That SQL would look like the SQL in Listing 1.
Code Listing 1: Removing duplicates from t
SQL> delete from t 2 where rowid in 3 ( select rid 4 from ( 5 select rowid rid, sa, sb, 6 row_number() over 7 (partition by least(sa,sb), 8 greatest(sa,sb) order by sa, sb) rn 9 from t 10 ) 11 where rn <> 1 12 ) 13 / 3 rows deleted.
For the future, I recommend that you prevent duplicates from entering the system in the first place. One way to do this would be to enforce a table-level check constraint:
SQL> alter table t 2 add constraint sa_less_than_equal_sb 3 check (sa <= sb); Table altered.
Then simply place a primary key on SA,SB. In that fashion, you'll never have A,B and B,A—because B,A would not be inserted in the first place. If you wish to permit B,A as a primary key value, however, hope is not lost; you can use a function-based index:
SQL> create unique index t_idx 2 on t(least(sa,sb), 3 greatest(sa,sb)); Index created.
That is, uniquely index the columns in sort order—taking the least of SA,SB as the first column in the index and the greatest of SA,SB as the second column.
This turned into a pretty interesting discussion online, when someone asked, "What about three columns—or N columns?" You might be interested in reading the original online discussion, at asktom.oracle.com/~tkyte/dedup.html, as well as seeing alternative DELETE statements that achieve the same goal of deleting duplicates.
Trigger Firing Order
I have a table and two or more AFTER INSERT/AFTER DELETE triggers. I have two to three AFTER INSERT triggers on the same table. Can you tell me what will be the firing order of the AFTER INSERT triggers? (I have tried finding this by using time stamps, but all have the same time stamps.)
The only thing I can say about the order of triggers firing is this:
You cannot depend on the order of triggers firing in a given trigger type. Even if you observe that the three BEFORE triggers you have fire in the order TRIGGER_A, TRIGGER_B, TRIGGER_C, you cannot count on that. The Oracle documentation (Oracle Database Application Developer's Guide—Fundamentals—) clearly states: "If you have multiple triggers of the same type on a single table, then Oracle Database chooses an arbitrary order to execute these triggers."
If your triggers depend on firing order, you must consolidate them into a single trigger. In fact, I recommend that all triggers be calls to packaged procedures, so that most of your triggers are one line long and consist entirely of a procedure call. In this fashion, you'll be able to take the two to three AFTER INSERT triggers, put their code into nice modular procedures in a package, and then call them in the right order from a single trigger.
As a side note, the online discussion raised the question, "Why does Oracle even have multiple same-type triggers?" The reason goes back to the original implementation of replication in Oracle Database. Materialized view logs (known simply as snapshot logs back in version 7.0) used a database trigger to record the data manipulation language (DML) operations that had taken place on a given table. Before the database supported multiple same-type triggers, this limited developers to either not using that trigger type themselves (Oracle had already used that trigger type) or not having a materialized view log . Neither of those options was appealing, so multiple same-type triggers were added as a feature to the database.
It is important that when using multiple same-type triggers, you ensure that there is no dependency between them—especially with regard to the order in which they fire, because you cannot control that in any way.
SELECT Allows Locks
Our developers (or at least our production support developers) need READ ONLY access to our production databases, so we have a role in every production database with SELECT on every application table, and this role is granted to developers.
A few weeks ago, one of our new developers locked up a key table in the production database and held up a batch operation. All the DBAs swore that one cannot acquire locks with just SELECT access, and we lost a lot of time figuring out how the developer had locked the table. We opened a TAR with Oracle, and the response was, "Yes, you can lock rows with just SELECT. In fact, you can lock the entire table by issuing LOCK TABLE." My reason for posting this here is to find out how many DBAs are not aware of this and to understand what is the reasoning behind allowing locking when you cannot modify data.
This locking is "known" and expected, and I agree that few people are aware of this feature (which is the primary reason I've decided to print this question—to raise awareness). The documentation for the LOCK TABLE command (Oracle Database SQL Reference), for example, includes the following: "Prerequisites: The table or view must be in your own schema or you must have the LOCK ANY TABLE system privilege, or you must have any object privilege on the table or view." (emphasis mine)
Note that these prerequisites do not necessarily convey the ability to modify data, but the ability to read data does convey the ability to lock—stabilize, freeze, disallow modification to—that data. SELECT ... FOR UPDATE works the same way. So why is this true?
Well, the following is one scenario that shows why. Suppose I have read access to the conference room table and I have read/write access to the conference schedule table. I would like to schedule conference room X . For this to happen safely (to avoid overlaps), no one else can schedule conference room X . Therefore, I need to lock conference room X (in the conference room table). The transaction that schedules rooms is a stored procedure, as it should be—all attempts to modify the conference schedule table are via this procedure. The stored procedure does a SELECT ... FOR UPDATE on the conference room (in the conference room table), locking it—and making it safe to peek at the conference schedule table to see if anyone else has it reserved.
Another scenario that shows why reading can lock a table is that I would like to delete from table P. P is the parent of C. I have read/write access to P. I have only read access to C. I attempt to delete from P, but before I can do that, I need to lock a bit of C (if indexed foreign keys exist) or all of C (if there is no index on a foreign key).
Complex Integrity Constraints
I've written a PL/SQL stored procedure, but I think it might not work correctly when multiple users are accessing the application at the same time. The stored procedure checks that a row with certain values does not exist and raises an error if it does. If the row does not exist, the procedure does some calculations and then inserts the row. I think that if another user inserted a row into that table and committed it after my stored procedure did the check but before it was finished with the insert, invalid data could be inserted in that table. This problem cannot be fixed by use of a unique key, because the fields do not always have to be different—only sometimes. The only solution I can think of is to check the condition at the start of the stored procedure and then again at the end—and if either check fails, then roll back. Is there a better way to do this?
The table has three columns—A, B, and C—and the combination of columns A and B must always be unique unless the value of column C is 'W'. If column C is 'W', then the values in columns A and B can be the same as those in another row. Also, column B can be null. Is it possible to create a constraint for this?
You are correct that you have multiuser issues to worry about. It is virtually impossible to enforce integrity constraints that cross rows in a table (as you have) or cross tables (such as foreign keys). Almost every attempt I see at doing this in application code or via triggers is implemented incorrectly for a multiuser environment . To succeed, you would need to effectively serialize modifications to this table via the LOCK TABLE command, for, as you have surmised, two users doing things at the same time cannot see each other's work.
Fortunately, there is a really nice way to do this—using function-based indexes. You can make use of two facts here:
So, if I were to have your table
SQL> create table t 2 ( a int, 3 b int, 4 c varchar2(1) not null 5 ); Table created.
and uniquely index A,B when C <> 'W', like this
SQL> create unique index t_idx 2 on t 3 ( case when c <> 'W' 4 then a 5 end, 6 case when c <> 'W' 7 then b 8 end 9 ); Index created.
I'd have exactly what you need:
SQL> set feedback off SQL> insert into t values (1,1,'W'); SQL> insert into t values (1,1,'W'); SQL> insert into t values (1,1,'x'); SQL> insert into t values (1,1,'y'); insert into t values (1,1,'y') * ERROR at line 1: ORA-00001: unique constraint (T_IDX) violated
Because I return NULL, NULL when C = 'W' and A, B when C <> 'W', I'll uniquely index A,B only when C <> 'W'—exactly what you need. It's sort of like indexing a WHERE clause.
Where to Begin
I have been a SQL Server 2000 DBA for the past four years, and I am keen to learn Oracle. Please let me know where to start.
I get this question a lot—so I've come up with a road map of where to start with the Oracle documentation.
It can be quite daunting to figure out where to begin, but I recommend Oracle Database Concepts for everyone. Then I branch out into developer and DBA tracks, as shown in Figure 1.
I also encourage you to download and install the free Oracle Database 10g Express Edition. The download is relatively small (about 150MB), and the installation is a snap on Windows or Linux. Five minutes after you download it, it'll be installed and running—it's a great way to begin learning Oracle.
And lastly, participate. Participate in forums, get your feet wet, and ask questions. Here are some forum suggestions:
The Usenet newsgroups. This is where I got started, in October 1994, with my first posting. Many people who were there then are, amazingly, still there today. I do not post there nearly as much as I used to, but I still browse it from time to time. There are many frequent contributors out there.
Oracle-l. This list is cool for two reasons. First, there are the contributions of people such as Wolfgang Breitling, Lex de Haan, Paul Drake, Mladen Gogala, Tim Gorman, Jonathan Lewis, Niall Litchfield, Connor McDonald, Cary Millsap, Tanel Poder, Mark Powell, Pete Sharman, and Jared Still, to name just a few. (This is not an exhaustive list of posters by any means.) Second, the sense of community there is large. At every conference or event of any size, there is usually a gathering of the Oracle-l'ites where everyone gets together. Many good friendships have grown out of that list.
The Oracle Users' Co-Operative FAQ. Sponsored and maintained by Jonathan Lewis. Remember to give as well as receive here. Anything that you discover is a frequently asked question where you work is probably an FAQ worldwide, so you too can contribute.
OTN discussion forums. You might even get an answer from the person who wrote the software you are asking about.
OracleMetaLink. I have to mention this one. There are interactive discussion forums in there as well.
Independent Oracle Users Group. I recommend this one not so much for the discussion forums (they do not get much traffic) as for the organization as an entity. I am a big advocate of the IOUG and try to do what I can to support it. I strongly encourage all of you who call Oracle your career to become part of it. The networking alone is worth it. Meet people in your area from other walks of life who do the same thing for different companies. Hear how they do it; share how you do it. The conferences are great—I've attended the IOUG-A (Americas), the UKOUG (and plan on being there again this year), and the Irish OUG conference, in addition to other OUG events far too numerous to even begin to list. This is a great organization with great technical content and strong leadership.
Some Oracle Database 10g Release 2 Goodies
By now, much has been written about Oracle Database 10g Release 2 new features (oracle.com/technetwork/issue-archive/2005/05-sep/o55asktom-085572.html, for example)—features such as transparent data encryption, the new LOG ERRORS DML clause, conditional compilation in PL/SQL, and so on. Now I'd like to look at some other high-impact but less-talked-about new features that have come up recently in talks or questions on asktom.oracle.com.
The first is better support for LOBs (large objects) in a distributed system. In PL/SQL, LOBs up to 32K in size (the maximum size of a string in PL/SQL) can be accessed over a database link freely—in Oracle Call Interface (OCI) and Java/JDBC, they may be fully accessed over a database link, because these languages do not have the inherent 32K limitation on string size.
Another high-impact new feature is Recovery Manager (RMAN) backup encryption. This can happen transparently with the Oracle Wallet, much like transparent data encryption, or may be specified at backup time with any password you like. Or you can use a combination of both, so that an encrypted backup can take place either transparently with the Oracle Wallet or with the supplied password (in the event that the Oracle Wallet is not available during restoration). Of course, you must take care to ensure that either the Oracle Wallet or the password is available—or else the data is encrypted and cannot be decrypted for restore.
Another RMAN new feature is the ability to skip any unused block while backing up the database. In the past, RMAN would "compress" backup sets by skipping (not backing up) any block in the database that had never been used, but after a block contained data at least once, it would forever be backed up. That is, if you created an empty 100GB tablespace and put nothing into it, RMAN would have skipped virtually all of the blocks in the tablespace (because they never had data) during backup. If you subsequently filled the tablespace with 100GB of data—using all of the blocks—and then dropped the table, RMAN would back up all of those blocks because they contained data (were formatted) at some point in time, even though they were currently empty. Starting with Oracle Database 10g Release 2, RMAN is now able to skip backing up these blocks, as long as they do not contain any information that truly needs to be backed up.
In a look at security, the CONNECT role has been seriously downgraded in the set of privileges it includes. In the past, CONNECT would allow a user to not only create a session but also create database cluster segments, create database links, create sequences, alter the session, create synonyms, create tables, and create views. (It sounds more like a development role than a simple connect role.) Those were some very serious grants. Starting in Oracle Database 10g Release 2, the CONNECT role has only the CREATE SESSION privilege.
Lastly, a long-awaited feature of Oracle Database 10g Release 2 is the ability to actually drop a datafile from a tablespace when it contains no data. This allows the DBA to remove a file that was accidentally added in the wrong location or to the wrong tablespace (the first time). In the past, once you gave a tablespace a file, that tablespace owned that file forever.
That is just a short list of five of the many new features to be found in Oracle Database 10g Release 2. If you are interested in exploring more about this new release, just go to oracle.com/pls/db102/portal.portal_db?selected=1 and select the New Features Guide . Don't forget to look at the first chapter—the "What's New in [ this document ]" chapter—of almost every piece of Oracle documentation as well. For example, if you are a DBA, you might be interested in reading Chapter 1 of the Oracle Database Administrator's Guide , entitled "What's New in Oracle Database Administrator's Guide? "
Tom Kyte (email@example.com) has worked for Oracle since 1993. He is a vice president in the Oracle Public Sector group and the author of Expert Oracle Database Architecture: 9i and 10g Programming Techniques and Solutions (Apress, 2005) and Effective Oracle by Design (Oracle Press, 2003), among others.