By Tom Kyte
Our Oracle expert looks at block-level information, sending HTML messages, estimating rows, and AQ.
When I periodically delete records to reclaim space, how do I know how much space is free after deletion? Is there any way I can find out the total number of blocks allocated and used before and after deletion?
You can see how many more blocks you have on the freelist. You can use DBMS_SPACE (an Oracle-supplied PL/SQL package) before running the delete and then again after. If the delete causes any block to fall within the range specified by pctused (a setting you specified when you created the table), the block goes back onto the freelist for the table.
When you delete, however, you may end up deleting only one row per block, and no blocks get added to the freelist (NUM_FREELIST_BLOCKS). So you've freed up space for subsequent updates but not for inserts—and you won't be able to tell that by using DBMS_SPACE.
In that case, an ANALYZE before and after the delete would be helpful because you could look at the AVG_ SPACE, BLOCKS, and EMPTY_ BLOCKS before and after the delete. For a look at a contrived example, see Listing 1.
In the table, rows 1, 2, and 3 are on the first block (2058); rows 4, 5, and 6 are on the second block (2059); and row 7 is on the third block (2060). Now I analyze the table, as shown in Listing 2.
Listing 2 shows me that before the deletes, the average free space per block is 2,241 bytes. I have 35 blocks in use by the table. Therefore, I have 35 * 2,241 bytes "free" for updates and such. Now I remove certain rows, as shown in Listing 3.
Listing 3 shows I now have 2,643 bytes free per block on average, or 402 bytes more per block than before. So I have 35 * 402 more bytes "free" for updates and—since I added to the free-list (NUM_FREELIST_BLOCKS)—for inserts as well.
Applications running on some non-Oracle databases can show estimated rows returned without executing the query. Is there a mechanism in Oracle to find row estimates without executing the query?
The execution plan provides this, as long as you are using the CBO (cost-based optimizer). The last part of the plan shows the estimated output.
Consider the example in Listing 4 . In the SELECT statement of the execution plan, CARD=196. CARD stands for cardinality and reports the estimated number of rows parsed in a given step in the query plan. So here the optimizer estimates that the step will return 196 rows. 196 = 14 * 14, and since SCOTT.EMP has 14 rows, that is exactly right.
This execution plan estimate can be very accurate if it has the proper information. If you can look at the data yourself and figure out how many rows the query returns, the optimizer should be able to do the same. Consider the execution plan in Listing 5 that uses "stats" (ANALYZE TABLE... STATISTICS;) .
Using just table stats, the plan has no idea about the skew of the columns. It does not understand that there are 196 Smiths in this table. So I try it again with more details, as shown in Listing 6 .
The optimizer is now estimating that there will be 207 rows returned, which is much closer to the actual number of 196. As you can see, given the proper amount of information, the execution plan makes better estimations.
Assume Table T with Columns C1, C2, etc., and various processes entering data into T. Similarly, say processes P1 and P2 are reading the data entered in T and processing it. What is the best and easiest way to achieve this, and how can this work without P1 and P2 stepping on each other as they wake up randomly and run?
This is the classic "multiple consumers of a queue of information in a database" scenario, and it is a perfect example of when to use message queues in the database.
Instead of writing your own queuing routines, I suggest that you use the built-in queuing mechanism afforded by Advanced Queuing (AQ) in Oracle.
AQ gives you queuing in the database, with multiple, highly concurrent queue consumer abilities. You get many other things as well, such as message routing and prioritization.
Your clients will put messages in the queue (instead of inserting a row in a table), and your consumers will receive the messages, process them, and put data into some other table (typically).
Although AQ is the correct implementation, there is an alternative. You might use this alternative if you have to implement something in an existing environment and cannot introduce new technologies. An alternative technique that works well is shown in Listing 7.
Listing 7 returns either a record that is yours (and it will be locked) or NULL. Only one session at a time will get a record, and if that session terminates abnormally the locks will be cleaned up automatically, making the record available for other sessions to process.
I'd like to send an HTML-formatted e-mail from the database using the UTL_SMTP package. I don't see any way of setting the MIME type; is this beyond the scope of UTL_SMTP?
I asked Tyler Muth firstname.lastname@example.org) to answer this one. Tyler provided this answer:
It is correct that there is no MIME type parameter in UTL_SMTP, but this does not limit the types of e-mail you can send. UTL_SMTP allows you to construct and pass in the whole e-mail message. This means that you can construct an HTML message using UTL_SMTP, but you're going to have to do some work to create it.
Listing 8 presents the basic structure of the message you need to construct.
Granted, this looks difficult, but if you use the procedure I wrote, it's really quite easy. The procedure does all of the work for you. Basically, you set up the body of the e-mail to be formatted as shown in Listing 8 and use UTL_SMTP.WRITE_DATA to send it.
Why are analytical functions so powerful? Relational theory tells us that joins are the best thing since sliced bread and that referencing the "previous" row is bad. How would you compare an analytical query with this:
select empno, (select sum(sal) from (select rownum rn, sal from (select sal from emp order by -sal) ) where center-1<=rn and rn<=center+1 ) from (select rownum center, empno from (select empno from emp order by -sal) )
In your query, you violate relational theory by using the pseudocolumn ROWNUM, which assigns order to rows in a table. You are relying on the ordering of rows in a set—another "nonpure" SQL extension ( ORDER BY in a subquery).
After looking at your query, I deduced that it shows the sum of the salary of
The row preceding the current row
The current row
The row following the current row
To show the basic advantages of analytic functions, I loaded up 1,000 employees into a very simple EMP table with two columns—EMPNO and SAL. On this table I placed four indexes—on EMPNO, SAL, (SAL,EMPNO), and (EMPNO,SAL), so that any index that could exist did exist. (The test queries would not run slowly due to the lack of an index.) I then analyzed the table, index, and indexed columns and ran your query and the following equivalent query that uses analytic functions:
select empno, sum(sal) over (order by sal desc rows between 1 preceding and 1 following ) x from emp;
The net result on my machine was that your query (as reported by SQL_TRACE and TKPROF) took 14.23 CPU seconds; my query took 0.03 CPU seconds. The query that did not use analytic functions performed 1,000 times the number of logical input/output operations to do the job, and it only performs worse as the volume of data goes up.
I find the second query using analytic functions not only faster but more intuitive—easier to read, comprehend, and code. No fancy "order and assign a row number, find the row numbers around my row number, and add them up." Just a simple "sum the salary of the preceding and following rows," period.
That's why I think analytic functions are the best thing to happen to the SQL language since SELECT.
Tom Kyte has worked with the Oracle Services Industry Technology Group, in Reston, Virginia, since 1993. Kyte is the author of Expert One on One: Oracle (Wrox Press, 2001).