Oracle Magazine Issue Archive
2011
September 2011
TECHNOLOGY: Ask Tom
On Caching and Evangelizing SQLBy Tom Kyte
Our technologist caches scalar subqueries, votes for SQL, and recommends technology and community. One of the talks I gave recently at the Oracle Benelux User Group (OBUG) conference in Belgium was regarding techniques—some SQL tricks, if you will—you can use when writing SQL. One of the techniques I discussed at length was scalar subqueries and how they can be used to minimize the number of times a PL/SQL function is called from SQL. This is important because the overhead of going from SQL to PL/SQL over and over again can be quite expensive, and the scalar subquery caching database feature can dramatically decrease this overhead. What exactly is a scalar subquery? It is a subquery in a SQL statement that returns exactly one column and zero rows or one row. That single column can be a complex object type, so it can consist of many attributes, but the subquery returns a single scalar value (or NULL if the subquery returns zero records). A scalar subquery can be used anywhere a literal could have been used. For example, you can obviously create the query select deptno, dname, 'Hello world' from dept; select deptno, dname, (select count(*) select dept.deptno, dept.dname, count(emp.empno) from dept left outer join emp on (dept.deptno = emp.deptno ) group by dept.deptno, dept.dname; So, now that you know what a scalar subquery is, you need to understand the scalar subquery caching feature of Oracle Database. In the above query with a scalar subquery, the database would actually extract the scalar subquery and rewrite it to use bind variables. In effect, the database would be executing (select count(*) from emp where emp.deptno = ?) select project_name, deptno, (select count(*) from emp where emp.deptno = projects.deptno) from projects; When you’re using a scalar subquery, Oracle Database will set up a small in-memory hash table for the subquery and its results each time it runs the query. So, when you run the previous query, Oracle Database sets up in memory a hash table that looks like this:
Now suppose the query returns a third row and it again includes a DEPTNO value of 10. This time, Oracle Database will see DEPTNO = 10, find that it already has that value in the hash table cache, and will simply return 42 from the cache instead of executing the scalar subquery. In fact, it will never have to run that scalar subquery for the DEPTNO values of 10 or 20 again for that query—it will already have the answer. What happens if the number of unique DEPTNO values exceeds the size of the hash table? What if there are more than 255 values? Or, more generally, if more than one DEPTNO value is assigned to the same slot in the hash table, what happens in a hash collision? The answer is the same for all these questions and is rather simple: Oracle Database will not be able to cache the second or nth value to that slot in the hash table. For example, what if the third row returned by the query contains the DEPTNO = 30 value? Further, suppose that DEPTNO = 30 is to be assigned to exactly the same hash table slot as DEPTNO = 10. The database won’t be able to effectively cache DEPTNO = 30 in this case—the value will never make it into the hash table. It will, however, be “partially cached.” Oracle Database still has the hash table with all the previous executions, but it also keeps the last scalar subquery result it had “next to” the hash table. That is, if the fourth row also includes a DEPTNO = 30 value, Oracle Database will discover that the result is not in the hash table but is “next to” the hash table, because the last time it ran the scalar subquery, it was run with an input of 30. On the other hand, if the fourth row includes a DEPTNO = 40 value, Oracle Database will run the scalar subquery with the DEPTNO = 40 value (because it hasn’t seen that value yet during this query execution) and overwrite the DEPTNO = 30 result. The next time Oracle Database sees DEPTNO = 30 in the result set, it’ll have to run that scalar subquery again. So, all this discussion so far was a setup—a prelude, if you will—for what I really wanted to write about: how to reduce the number of times a PL/SQL function invoked from SQL is called. Suppose you have a PL/SQL function like the one in Listing 1. This function will simply increment a counter each time it is called. The counter value will be stored in the CLIENT_INFO column of V$SESSION, and the function will then return the length of its input. Code Listing 1: PL/SQL function f
SQL> create or replace function f( x in varchar2 )
return number
2 as
3 begin
4 dbms_application_info.set_client_info
(userenv('client_info')+1 );
5 return length(x);
6 end;
7 /
Function created.
If you execute a simple query like the one in Listing 2 (note that the STAGE table is simply a copy of ALL_OBJECTS), you can see that the function was called once per row even though the inputs to the function were repeated over and over again. If you use a scalar subquery—if you replace “f(owner)” with “(select f(owner) from dual)”—you will notice a massive reduction in calls to the function, as shown in Listing 3. Code Listing 2: Demonstrating once-per-row repetition in function f
SQL> begin
2 :cpu := dbms_utility.get_cpu_time;
3 dbms_application_info.set_client_info(0);
4 end;
5 /
PL/SQL procedure successfully completed.
SQL> select owner, f(owner) from stage;
...
72841 rows selected.
SQL> select
2 dbms_utility.get_cpu_time-:cpu cpu_hsecs,
3 userenv('client_info')
4 from dual;
CPU_HSECS USERENV('CLIENT_INFO')
--------- -----------------------
118 72841
Code Listing 3: Reducing function calls with scalar subquery
SQL> begin
2 :cpu := dbms_utility.get_cpu_time;
3 dbms_application_info.set_client_info(0);
4 end;
5 /
PL/SQL procedure successfully completed.
SQL> select owner, (select f(owner) from dual) f from stage;
...
72841 rows selected.
SQL> select
2 dbms_utility.get_cpu_time-:cpu cpu_hsecs,
3 userenv('client_info')
4 from dual;
CPU_HSECS USERENV('CLIENT_INFO')
--------- -----------------------
29 66
Now, some of you might be thinking about the possibility of marking the function deterministic, because it is, in fact, deterministic. Wouldn’t that reduce the number of function calls as well? The short answer is “yes,” but the longer answer is “yes, but not as well as scalar subquery caching can.” As you can see in the deterministic function in Listing 4, the number of calls is reduced, but only down to 8,316. The scalar subquery cache in this case was better than just marking the function as deterministic. (Note: Marking a function deterministic affects caching only in Oracle Database 10g and above; you will see no change in the number of calls in Oracle9i Database with deterministic functions.) Code Listing 4: Deterministic function variation
SQL> create or replace function f( x in varchar2 )
return number
2 DETERMINISTIC
3 as
4 begin
5 dbms_application_info.set_client_info
(userenv('client_info')+1 );
6 return length(x);
7 end;
8 /
Function created.
SQL> begin
2 :cpu := dbms_utility.get_cpu_time;
3 dbms_application_info.set_client_info(0);
4 end;
5 /
PL/SQL procedure successfully completed.
SQL> select owner, f(owner) from stage;
...
72841 rows selected.
SQL> select
2 dbms_utility.get_cpu_time-:cpu cpu_hsecs,
3 userenv('client_info')
4 from dual;
CPU_HSECS USERENV('CLIENT_INFO')
--------- -----------------------
69 8316
Listing 5 shows a function result cache solution. The number of function calls is 32 (because I happen to have 32 schemas in my database), and the CPU time is 0.73 seconds—about the same as that of the DETERMINISTIC function and far above the CPU time of 0.29 seconds used by the 66 calls made by the scalar subquery. Code Listing 5: Function result cache solution
SQL> create or replace function f( x in varchar2 )
return number
2 RESULT_CACHE
3 as
4 begin
5 dbms_application_info.set_client_info
(userenv('client_info')+1 );
6 return length(x);
7 end;
8 /
Function created.
SQL> begin
2 :cpu := dbms_utility.get_cpu_time;
3 dbms_application_info.set_client_info(0);
4 end;
5 /
PL/SQL procedure successfully completed.
SQL> select owner, f(owner) from stage;
...
72841 rows selected.
SQL> select
2 dbms_utility.get_cpu_time-:cpu cpu_hsecs,
3 userenv('client_info')
4 from dual;
CPU_HSECS USERENV('CLIENT_INFO')
--------- -----------------------
73 32
Code Listing 6: Rerunning the function result cache solution
SQL> begin
2 :cpu := dbms_utility.get_cpu_time;
3 dbms_application_info.set_client_info(0);
4 end;
5 /
PL/SQL procedure successfully completed.
SQL> select owner, f(owner) from stage;
...
72841 rows selected.
SQL> select
2 dbms_utility.get_cpu_time-:cpu cpu_hsecs,
3 userenv('client_info')
4 from dual;
CPU_HSECS USERENV('CLIENT_INFO')
--------- -----------------------
63 0
select * from t where column = plsql_function(..); select * from t where column = (select plsql_function(...) from dual); to get the performance benefit of the scalar subquery cache. Is the Cursor Dying?With subquery factoring, MODEL, PIVOT, MERGE with DELETE, “Materialize” and other hints, global temporary tables, DML (data manipulation language) error logging, and so many other powerful new SQL features, when is it necessary to use cursors (apart from using ref cursors for reports). Is the cursor dying? Well, the simple UPDATE statement update t set x = 5; However, I think you may mean, “Is the need to procedurally process the output of a cursor dying?” Many times (many, many, many times), a ton of procedural code can be replaced with a single SQL statement. That has been true for a long time and is truer today than ever with features such as DML error logging. So, yes, we should all be striving to erase as much procedural code as possible and go set-based. Does it mean the death of cursors? No, not at all. It means the reduction of procedural code—more code = more bugs; less code = fewer bugs. But it also means that people have to read the documentation, understand the benefits, learn the full SQL language, and start using it. I was once asked in a question-and-answer session what I thought was the most underutilized Oracle Database feature. I responded almost immediately with the answer “SQL.” The questioner then asked me, “But what do you mean—everyone uses SQL all the time.” My response was that everyone uses very, very simple SQL and avoids 99 percent of its actual capabilities. Sequence CachingI read about sequence caching, but I still do not understand the CACHE and NOCACHE settings. Could you explain CACHE and NOCACHE with an example? Sure. In short, if you use NOCACHE, every time you call sequence.nextval, you will cause a table named SYS.SEQ$ to be updated and committed. Every single time. If, on the other hand, you use the default value, CACHE 20, you will cause SYS.SEQ$ to be updated and committed every 20th time you call sequence.nextval. It is a performance thing. If you don’t cache the next couple of sequence values in the system global area (SGA), you will be doing an UPDATE and COMMIT every single time you call nextval. You can evaluate the performance impact this might have on your application—especially with large data loads—if the cache value is set too low. For example, let’s create three sequences SQL> create sequence nocache_seq NOCACHE; Sequence created. SQL> create sequence default_cache_seq; Sequence created. SQL> create sequence big_cache_seq cache 100000; Sequence created. SQL> create table t ( x int ); Table created.
SQL> truncate table t; Table truncated. SQL> insert into t (x) 2 with data(x) 3 as 4 (select 1 x 5 from dual 6 union all 7 select x+1 8 from data 9 where x <= 100000) 10 select &1.nextval 11 from data 12 / 100001 rows created. update seq$ set ... where obj#=:1 call count cpu elapsed --- ----- ---- ------- Parse 1 0.00 0.00 Execute 100001 33.22 40.23 Fetch 0 0.00 0.00 --- ----- ---- ------- total 100002 33.22 40.23 Using the default cache sequence (default_cache_seq), you see very different numbers: call count cpu elapsed --- ----- ---- ------- Parse 1 0.00 0.00 Execute 5001 1.60 1.89 Fetch 0 0.00 0.00 --- ----- ---- ------- total 5002 1.60 1.89 Using the big cache sequence (big_cache_seq), you see call count cpu elapsed --- ----- ---- ------- Parse 1 0.00 0.00 Execute 2 0.00 0.00 Fetch 0 0.00 0.00 --- ----- ---- ------- total 3 0.00 0.00 Something I Don’t Usually DoIf you’ve been looking for an excuse to learn Oracle Application Express and you wanted a book for working through it, Expert Oracle Application Express (Apress, 2011) is for you. I know most of the authors personally and can attest to their technical knowledge of Oracle Application Express. The technical content of the book is good, and I also really appreciate what the authors are doing with the royalties. They are contributing 100 percent of the royalties received to the families of two Oracle Application Express developers who have passed away. You can read about the book and the royalty donations at bit.ly/jSLd4q.
Tom Kyte is a database evangelist in Oracle’s Server Technologies division and has worked for Oracle since 1993. He is the author of Expert Oracle Database Architecture (Apress, 2005, 2010) and Effective Oracle by Design (Oracle Press, 2003), among other books.
Send us your comments | |||||||||||||||||||