Our Oracle expert finds the right order, settles the packages-versus-procedures debate, and gathers information on collecting.
Should I use the most selective column first in the order of columns that comprise the index? I assumed that the most selective column should be the first column of an index, but doing this prevents us from taking advantage of compressed indexes.
What drives the order of columns in a concatenated index should be the types of predicates you use. For example, look at the following queries:
select * from t where a = :a and b = :b and c = :c; select * from t where b = :b and c = :c; select * from t where c = :c;
The most logical index for these queries might be on (C,B,A) because that index can easily be used for all three queries. Note that I don't really care about selectivity here; the optimizer takes care of that for me. So if index C is not selective enough by itself, the optimizer doesn't use the index for select * from t where c = :c;. If index C is selective enough, the optimizer uses C.
In Oracle9i, there is an even more compelling reason to put less selective columns first: a new feature called index skip scans. With a skip scan, having the least discriminating entry first can allow you to use an index even when the columns in the predicate are not on the leading edge of the index.
Skip scans work well only if the leading edge of the index (C, in the example above) has very few discrete values and the optimizer understands that. For example, an index on (GENDER,EMPNO) where gender has the values 'M', 'F', and where EMPNO is unique is great for using a skip scan. So, given a query such as the following:
select * from t where empno = 5;
the optimizer in Oracle9i might consider using that index on T to satisfy the query in a skip scan method. The query is processed conceptually as follows:
select * from t where GENDER='M' and empno = 5 UNION ALL select * from t where GENDER='F' and empno = 5;
If the optimizer knows that there are multiple values for GENDER, it skips around in the index, scanning for EMPNO=5. The script in Listing 1 demonstrates this with two cases where the optimizer uses different skip scans.
In Case 1, the optimizer understands that there are just a few subindexes by gender, so it uses a skip scan, as shown in the report in Listing 2.
In Case 2, there are 256 subindexes; the optimizer recognizes this and properly adjusts its query plan, as shown in Listing 3. As you can see in this case, the skip scan does a lot less work than a full scan would.
Is it better to have all related sub- programs in one big package or to have individual subprograms in the database? In which case is the performance better?
I wish all answers were as clear-cut as this one. Packages win every time. Use standalone procedures only for examples and bits of throwaway code. Use packages for all real production code.
The No. 1 reason packages win is that they break the dependency chain. For example, if procedure A calls procedure B and B is "changed," then A is directly affected and must be recompiled. Procedure A will have to recompile itself automatically the next time it is run.
On the other hand, if I create a package PKG_A with a procedure A and a package PKG_B with a procedure B, then PKG_A is dependent on PKG_B's specification (spec). I can compile and recompile PKG_B's body as often as I like without affecting PKG_A's state. As long as PKG_B's spec or interface does not change, dependent objects are not affected by the recompilation of the body.
This example shows the differences:
SQL> create or replace procedure B 2 as 3 begin 4 null; 5 end; 6 / Procedure created. SQL> create or replace procedure A 2 as 3 begin 4 B; 5 end; 6 / Procedure created.
So procedure A calls procedure B. One downside to procedures, in addition to the direct dependency, is that you have to create them in the right order if you want everything to be valid in the database after installation. I had to create B before A. Packages don't have this requirement either, as shown in the following:
SQL> create or replace package pkg_a 2 as 3 procedure a; 4 end; 5 / Package created. SQL> create or replace package pkg_b 2 as 3 procedure b; 4 end; 5 / Package created.
Here I create the specs for the packages. They are not dependent on each other and the specs can be created in any order, as shown in the following:
SQL> create or replace package body pkg_a 2 as 3 procedure a 4 is 5 begin 6 pkg_b.b; 7 end; 8 end; 9 / Package body created. SQL> create or replace package body pkg_b 2 as 3 procedure b 4 is 5 begin 6 null; 7 end; 8 end; 9 / Package body created.
Here I create the package bodies. Generally, I can create the bodies in any order after the specs. PKG_A is dependent on PKG_B's spec—not its body—so there's no ordering problem. Notice in the above example that PKG_A calls PKG_B, but PKG_A was in fact created first. Here's a look at the fundamental difference between packages and procedures:
SQL> select object_type, object_name, status 2 from user_objects 3 where status = 'INVALID' 4 order by object_type, object_name 5 / no rows selected
In Listing 4, standalone procedure A immediately becomes invalid. B was changed, so A needs to be recompiled. In Listing 5, PKG_A never becomes invalid; it never needs recompiling, because only PKG_B's body was modified. As long as the spec does not change, PKG_A will remain valid.
For this reason, I urge people to use only packages for "real" code. Packages can seriously improve performance in a database by reducing or eliminating the number of times things get compiled. If you do everything in standalone procedures, a simple fix to one procedure might make your entire set of routines become invalid. For example, A calls B, C calls A, D calls C, and so on. If you update B, A becomes invalid; when A recompiles, C becomes invalid—you get the idea. Packages can prevent this.
Can you explain the differences between the folowing methods in analyzing objects? And do you have a preference?
For quick-and-dirty examples, I use this command:
SQL> analyze table t compute statistics for table for all indexes for all indexed columns;
In real life, however, one would want to use DBMS_STATS instead. It supports many features that the ANALYZE command does not, such as:
The ability to gather statistics in parallel. This can greatly reduce the time needed to collect statistics.
The ability to manually set statistics to specific values when needed.
Easier automation/scheduling with DBMS_JOB, because it is a native PL/SQL API and doesn't require dynamic SQL (as ANALYZE would).
The ability to gather statistics only on those objects that need to have statistics gathered. You use the ALTER TABLE T MONITORING command to accomplish this. Now when you gather statistics, only those objects that have been modified and that require new statistics are analyzed; the remaining objects in your schema (the ones that have not changed) are skipped. You don't have to figure out what to analyze—the server does it for you.
DBMS_UTILITY.ANALYZE_SCHEMA is left over from Oracle7 Release 7.x. This was before the DBMS_STATS package and its functionality was introduced. I recommend that you use DBMS_STATS and stop using DBMS_ UTILITY. DBMS_UTILITY exists for backward compatibility, but DBMS_STATS is the proper package to use now.
Oracle Magazine columnist Tom Kyte (firstname.lastname@example.org) has worked with the Oracle Services Industry Technology Group, in Reston, Virginia, since 1993. Kyte is the author of Expert One on One: Oracle (Wrox Press, 2001).