As Published In
Oracle Magazine
July/August 2002

TECHNOLOGY: Ask Tom


Parsing Densities, ANYDATA, and XML

By Tom Kyte Oracle Employee ACE

Our Oracle technologist looks at the hard parse, the soft parse, and the softer parse.

Can you explain the difference between a soft parse and a hard parse?

Here is a long-winded answer, excerpted from a new book that I collaborated on with four other Oracle employees. The book, Beginning Oracle Programming (Wrox Press, 2002,www.wrox.com), is aimed at the newly indoctrinated Oracle developer.

Parsing

Parsing is the first step in the processing of any statement in an Oracle database. Parsing is the act of breaking the submitted statement down into its component parts, determining what type of statement it is (query, DML, or DDL), and performing various checks on it.

Parsing performs two main functions:

Syntax Check. Is the statement a valid one? Does it make sense given the SQL grammar documented in the SQL Reference Manual? Does it follow all of the rules for SQL?

Semantic Analysis. Going beyond the syntax, is the statement valid in light of the objects in the database? Do the tables and columns referenced exist? Do you have access to the objects, and are the proper privileges in place? Are there ambiguities in the statement? For example if there are two tables t1 and t2 and both have a column x , the query "select X from T1, T2 where ..." is ambiguous; the query doesn't know which table to get x from.

You can think of parsing as basically a two-step process: a syntax check to check the validity of the statement and a semantic check to ensure that the statement can execute properly. The difference between the two types of checks is hard to see. Oracle Database does not come back and say "the statement failed the syntax check"; rather, it returns the statement with an error code and message. For example, the following statement fails with a syntax error: 

SQL> select from where 2;
select from where 2
      *
ERROR at line 1:
ORA-00936: missing expression

The following statement fails with a semantic error: 

SQL> select * from not_a_table;
select * from not_a_table
              *
ERROR at line 1:
ORA-00942: table or view does not exist

If the table not_a_table existed and I had permission to access it, this statement would have succeeded. Looking at these errors is the only way to really tell the difference between a semantic and syntactic error. If the statement could have executed, given the proper objects and privileges, then you had a semantic error. If the statement could not have executed under any circumstances, you had a syntax error. Regardless, an Oracle database will not execute this error-generating statement.

The next step in the parse operation is to see if the statement being parsed has already, in fact, been processed by some other session. If the statement has been processed, the parse operation can skip the next two steps in the process: optimization and row source generation. If the parse skips these next two steps, it is called a soft parse. A soft parse is a shorter process to running your query. If your query has never been parsed by any session, the parse must do all of the parsing steps. This parse is called a hard parse. A hard parse must syntactically and semantically parse the query, optimize it, and generate the row source plan for the query. This distinction between the steps of a soft parse and a hard parse is very important. When developing your applications, you want a high percentage of your queries to be soft-parsed so that parsing can skip the optimize and generate phases, which are both very CPU intensive and a point of contention (serialization). If you have to hard-parse a large percentage of your queries, your system will function slowly and, in some cases, not at all.

This sharing of SQL in Oracle Database (that enables soft parses) works via the shared pool, a piece of memory in the System Global Area (SGA) maintained by Oracle Database. Oracle Database parses the query, and after the query passes the syntax and semantic checks, the database will look in the shared pool component of the SGA to see if that same exact query has already been processed by another session. Since the database has performed the semantic check, it has already determined: 

  • Exactly what tables are involved

  • That you have access to the tables (because the proper privileges are there).

Now the database can look at all of the queries in the shared pool that have already been parsed, optimized, and generated to see if the hard-parse work of that query has already been done.

Session-Cached Cursor Parameter

Can you explain the meaning and use of the parameter SESSION_CACHED_CURSOR?

In short, the SESSION_CACHED_CURSOR parameter allows Oracle Database to do a "softer" soft parse. In the answer to the first question in this issue's column, I discussed the differences between a soft parse and a hard parse, with the soft parse being the more desirable of the two. But truth be told, the really desirable state is to not have to parse at all! The less work, the better. That's why session-cached cursors is a useful feature in environments where the application repeatedly parses the same statement, instead of parsing a statement once per session. Using the SESSION_CACHED_CURSOR parameter, you can have Oracle Database "save" your parsed cursors, allowing you to reuse them without going through the entire soft-parse process documented above.

Web Locator


Oracle Managing Technologist Tom Kyte answers your most difficult technology questions at
 asktom.oracle.com

Highlights from that forum appear in this column.

Oracle product documentation
 oracle.com/technetwork/documentation

You can see how this parameter can affect an application's performance by using the following test case. See asktom.oracle.com for the test harness I use to test the performance implications of any two approaches in general. This location includes the tables and requirements for running this test harness. (Note that some of the details, such as table definitions and views used by the test harness, have been omitted from this column for brevity.) Listing 1 shows the test statements and the timing first for the execution without session-cached cursors ( 45 hsecs ) and then for the execution with session-cached cursors ( 35 hsecs ).

Listing 1 shows that session-cached cursors ran faster. (I ran the test in Listing 1 a couple of times, and there are no hard parses taking place.) The really good news appears in Listing 2, when I run a test to check the latch counts associated with session-cached cursors.

Listing 2 shows a significantly reduced number of latch counts on the library cache and shared pool. Since a latch is a lock, and a lock is a serialization device, and serialization implies that you must wait for something, using session-cached cursors will increase scalability and performance as you add more and more users. It's not only faster but more scalable as well.

More Parsing

I understand that to parse SQL, Oracle Database always uses two steps to check SQL code syntactically and semantically and then it checks the shared pool for previously parsed executions of that SQL. I have read that Oracle always saves unparsed representations of SQL code, and that there is a hashing algorithm to quickly locate this code. So why doesn't Oracle Database make this step (checking the shared pool for a matching statement) the very first parsing step, before making any other checks?

Even when soft parsing, Oracle Database needs to parse the statement before it goes to look in the shared pool. The hashing of the SQL statement makes it faster to find the statement in the shared pool, but Oracle Database still needs to parse it. Oracle Database needs to do two things: 

  • Check the syntax to make sure it can understand it 

  • Check the semantics to see what objects are really referenced

Only then can Oracle Database safely go to the shared pool to see if the SQL was already parsed. Think about it; assume the following query is run by the user scott

select * from t;

And now you, not scott , submit 

select * from t;

Can your query blindly reuse the plan in the shared pool? You don't know; the database doesn't know. The database must first parse the statement, figure out that t is a table, and determine the owner of the table t in question. Then, the database can go to the shared pool to see if the statement is there. When the database finds the identical select * from t , it will have to further verify that the t referenced by both queries is the same exact t .

Hence, the database needs to do the semantic part of the parse before it can find a statement in the shared pool to reuse. (There could be 1,000 select * from t statements in the shared pool, for example, none of which would work for this example).

The database needs to syntactically parse the query before it can semantically parse it. The hash is good only for finding query strings that are the same; it does not help the database figure out if the referenced statement is the same statement as in your execution context.

SYS.ANYDATA Type

Oracle9i Database seems to have a new datatype called the ANYDATA type. How and where could you use this? If someone uses date values and someone else uses number values in these columns, how can you extract and interpret the information?

There is a new datatype called ANYDATA. This type can be potentially useful in an application that stores generic attributes—attributes for which you don't know what the datatypes are until you actually run the code. In the past, you would have just stuffed everything generic into a VARCHAR2—dates, numbers, everything. Now, in ANYDATA, you can put in a date and it stays a date. The system will enforce that it is, in fact, a valid date and let you perform date operations on it. If you put the same date in a VARCHAR2, someone could put "hello world" into your "date" field.

Listing 3 shows a quick example of how to use the ANYDATA type.

The SYS.ANYDATA type has many methods, one of which is called getTypeName. You can use this method of the ANYDATA type to see what kind of data is actually stored in the ANYDATA type: 

SQL> select t.x.gettypeName() typeName
  from t t;

TYPENAME
________________

SYS.NUMBER
SYS.DATE

SYS.VARCHAR2

Unfortunately, there is no easy method for displaying the contents of ANYDATA in a query. It is assumed that the program will fetch the data, figure out what it is, and do something with it. In SQL*Plus, however, this can be an issue; it is not easy to see what is in there. You can rectify this easily enough, however, using the getData function in Listing 4.

XMLType versus Relational Schema

My input data is XML. My current requirement is to produce database reports from this data, using Oracle Reports and Oracle Discoverer. The XML structure itself is pretty complex. The database schema I have created from my XML schema has several master detail tables (> 50).

My questions are these: Would I be better off not storing this data in relational tables, and storing it instead in a few tables with XMLType columns? How hard or easy would it be to generate reports from this data if I had it all in XMLType columns?

From what I have read, XMLType enables storage of XML and searching with XPath in PL/SQL. Can this easily be translated in a new way of writing reports over large amounts of relational data? What about performance?

Well, that's a big question. In order to answer it fully, I asked Sean Dillon (sean.dillon@oracle.com), our local XML guru, to look at this. He actively uses all of the XML technologies in Oracle9i and is best positioned to give real-world advice. Here's his response:

"To convert or not to convert, that's always the question. There are as many opinions on this as there are people out there willing to talk about it. Let me tell you what I think, and you can draw your own conclusions.

There are a variety of factors you should take into consideration when determining how to store and access your data. There are a few different ways you can store your data, and each different way offers trade-offs you need to consider.

Get More


Beginning Oracle Programming (Wrox Press, 2002, www.wrox.com) is available through Wrox Press, Amazon, and many other retailers. Thank you to Wrox Press for allowing us to excerpt this book for some of this column. The book content was edited to fit the space available.

First of all, you can store XML data as native XML in Oracle databases. In Oracle8i and below, you do this in CLOBs or NCLOBs . In Oracle9i Database and above, you can use a new datatype called XMLType . These native XML storage options offer XML purists the ability to store completely unstructured data in the database with no prior knowledge of the format of the XML data. This also allows you to maintain document fidelity (so things like whitespace, processing instructions, and so forth are maintained along with the original document) or DOM fidelity (a new XMLType storage option in Oracle9i Database Release 2).

Using XMLType columns gives you a variety of XML capabilities, such as XPath searching, XPath extracts, verifying existence of a node ( existsnode() ), checking to see if your XML is a fragment or an entire document ( isFragment() ), transforming XML documents using XSLT ( transform() ), and converting your XML document into a user-defined type ( toObject() ). With XMLType or CLOBs , you can use Oracle Text (also known as Oracle inter Media) to index your XML data for high-speed XML-centric searching. (An example search is, "Show me all those documents in my database where the 'DESCRIPTION' tag contains the words 'really groovy.'") In Oracle9 i Database, the Oracle Text indexing capabilities have been improved significantly to offer more XPath-like high-speed indexing.

Just because you store your data in native XML does not mean you can't use relational access. In Oracle9 i Database, you can define views on top of your XML document storage to enable high-speed relational access to your XML data by using a healthy mix of XPath expressions, pipelined functions, and function-based indexes. This works and can be very fast, but it has a considerable amount of administrative overhead and is overkill if you will always be accessing your data relationally. The only reason I would advocate storing your data only as native XML is if you publish the data in XML the vast majority of the time. If you need to access the data relationally only every once in a while, this native XML storage approach will work. If the majority of your access to the data is in relational queries, however, read on.

You can also store your data in relational tables, as you have pointed out. You can map elements in your XML data to columns in a table (or object table) and parse the XML document when you put it into these rows and columns. This is the obvious solution for Oracle Reports and Oracle Discoverer access. If you know the structure of your XML data and you won't be dealing with other XML schemas or DTDs for the majority of your documents, this is an efficient way to store your data. In this case, consider XML as the data-transfer mechanism; it's important when importing or exporting data. But within the confines of the database and reporting interface, it's just data, not XML. There are quite a few people who like to use XML simply because it's XML (i.e., snazzy, cool, fun, exciting). In reality, you need to use the technology that best meets your requirements. If the majority of your data access is relational, store your data relationally.

People worry that because the data is stored relationally, they've lost their XML. This isn't a problem. Oracle has a variety of ways to construct/generate XML data based on relational queries. You can also programmatically construct XML data if you need to. There are many ways to do this; you just need to determine the best mechanism, depending on your environment and data model.

Another approach you can take is to store both native XML and relational data. When you put the native XML document into the database, parse it as well. This satisfies the relational data access needs, as well as giving you such advantages as XPath queries, Oracle Text XML indexing, XML document fidelity, and an audit trail (by storing the original XML you loaded). Yes, there is more data storage in the database, but depending on your needs, this is an option.

Ultimately, the answer to whether to convert or not convert XML for database storage is that it depends. I hope the information here helps you understand some of the pros and cons for the different ways you can do it. Good luck!"


Columnist Tom Kyte (thomas.kyte@oracle.com) has worked with the Oracle Services Industry Technology Group in Reston, Virginia, since 1993. Kyte is the author of Expert One on One: Oracle and coauthor of Beginning Oracle Programming (both published by Wrox Press, www.wrox.com).

Send us your comments