|
Technical Note
Improving SQL*Loader Performance: Tuning the Date Cache in Direct Path Loads
Authors: Carol Palmer, Principal Product Manager, and Jim Stenoish, Principal Member Technical Staff
Date: July 2004
This Technical Note is the first in a series about improving SQL*Loader performance. It provides steps for tuning the SQL*Loader date cache to improve performance during direct path loads. But before we talk about tuning the date cache, let's briefly overview SQL*Loader loading methods.
Data Loading Methods
SQL*Loader provides two methods for loading data: conventional path load and direct path load. A conventional path load executes SQL INSERT statements to populate tables in an Oracle database. When SQL*Loader performs a conventional path load, it competes equally with all other processes for buffer resources, which can significantly slow the load.
A direct path load eliminates much of the database overhead by formatting Oracle data blocks and writing the data blocks directly to the database files. It does not compete with other users for database resources, so it can usually load data at near disk speed. Large amounts of data can be loaded in parallel for maximum performance. (There are a few restrictions to using direct path load; see Chapter 11 of the Oracle Database Utilities 10g Manual.)
Tuning the Date Cache in Direct Path Loads
If you are performing a direct path load in which the same date or timestamp values are loaded many times, a large percentage of total load time can end up being used for converting date and timestamp data. You can improve performance by using the SQL*Loader date cache.
The date cache reduces the number of date conversions done when many duplicate values are present in the input data. It enables you to specify the number of unique dates anticipated during a load.
The date cache is used by direct path loads to speed up the conversion between text and Oracle date columns. It is also used if a date format string is specified for a string loaded into a character column. In this case, the input string is converted into a date to ensure it is a valid date for the specified format.
Each unique date string gets an entry in the cache. If the number of unique dates for a table exceeds the size of the date cache, then the date cache is disabled. All date columns in a table share the same date cache. The date string, the date format specified for the string, and the language of the date string combine to make a unique entry.
For example, if data for one column, D1, shares the same date format, string value, and language as a date value already converted for another date column, D2, then the direct path loader will use the date value already calculated for D2. This approach prevents unnecessary conversions that affect performance.
If you know the number of duplicate date strings is small or the number of unique date strings is very large, then you can disable the date cache by specifying DATE_CACHE=0 on the command line.
SQL*Loader Log File
Information about the use of the date cache during a load is printed at the end of the SQL* Loader log file. You can use this information to improve direct path load performance.
For example, if the cache did not overflow during the load, the information in the log file looks like this:
Date Cache Statistics for table SYS_SQLLDR_X_EXT_TKLDC1
Max Size: 1000
Entries: 353
Hits: 748
Misses: 0
- Max Size is the maximum number of entries in the cache. By default, the date cache size is 1,000 elements, but it can be changed with the SQL*Loader DATE_CACHE command-line parameter.
- Entries is the number of unique dates encountered in the load.
- Hits is a count of the number of conversions that didn't need to occur because the same value has already been converted.
- Misses is a count of date strings that did not match an entry in the date cache.
If the date cache did overflow, then the following message is displayed in the log file:
Date conversion cache disabled due to overflow (default size: 1000)
The default size in the message is the number of entries specified by the DATE_CACHE command-line parameter. If you see this message in the log file, you may want to try either increasing the size of the date cache or disabling the cache by specifying DATE_CACHE=0 on the SQL*Loader command line.
Conclusion
When you are performing a direct path load in which the same date or timestamp values are loaded many times, remember to use the SQL*Loader date cache for improved SQL*Loader performance. |