Oracle Database 11g:
The Top Features for DBAs and Developers
by Arup Nanda, Oracle ACE Director
The Top Features for DBAs and Developers
by Arup Nanda, Oracle ACE Director
Present information in a spreadsheet-type crosstab report from any relational table using simple SQL, and store any data from a crosstab table to a relational table.
As you know, relational tables are, well, tabular—that is, they are presented in a column-value pair. Consider the case of a table named CUSTOMERS.
SQL> desc customers
Name Null? Type
----------------------------------------- -------- ---------------------------
CUST_ID NUMBER(10)
CUST_NAME VARCHAR2(20)
STATE_CODE VARCHAR2(2)
TIMES_PURCHASED NUMBER(3)
When this table is selected:
select cust_id, state_code, times_purchased
from customers
order by cust_id;
The output is:
CUST_ID STATE_CODE TIMES_PURCHASED
------- ---------- ---------------
1 CT 1
2 NY 10
3 NJ 2
4 NY 4
...
and so on ...
Note how the data is represented as rows of values: For each customer, the record shows the customer's home state and how many times the customer purchased something from the store. As the customer purchases more items from the store, the column times_purchased is updated.
Now consider a case where you want to have a report of the purchase frequency each state - that is, how many customers bought something only once, twice, thrice and so on, from each state. In regular SQL, you can issue the following statement:
select state_code, times_purchased, count(1) cnt
from customers
group by state_code, times_purchased;
Here is the output:
ST TIMES_PURCHASED CNT
-- --------------- ----------
CT 0 90
CT 1 165
CT 2 179
CT 3 173
CT 4 173
CT 5 152
...
and so on ...
This is the information you want but it's a little hard to read. A better way to represent the same data may be through the use of crosstab reports, in which you can organized the data vertically and states horizontally, just like a spreadsheet:
Times_purchased
CT NY NJ ...
and so on ...
1 0 1 0 ...
2 23 119 37 ...
3 17 45 1 ...
...
and so on ...
Prior to Oracle Database 11g, you would do that via some sort of a decode function for each value and write each distinct value as a separate column. The technique is quite nonintuitive however.
Fortunately, you now have a great new feature called PIVOT for presenting any query in the crosstab format using a new operator, appropriately named pivot
. Here is how you write the query:
select * from (
select times_purchased, state_code
from customers t
)
pivot
(
count(state_code)
for state_code in ('NY','CT','NJ','FL','MO')
)
order by times_purchased
/
Here is the output:
. TIMES_PURCHASED 'NY' 'CT' 'NJ' 'FL' 'MO'
--------------- ---------- ---------- ---------- ---------- ----------
0 16601 90 0 0 0
1 33048 165 0 0 0
2 33151 179 0 0 0
3 32978 173 0 0 0
4 33109 173 0 1 0
... and so on ...
This shows the power of the pivot
operator. The state_codes are presented on the header row, instead of a column. Pictorially, here is the how the traditional tabular format looks:
Figure 1 Traditional tabular representation
In a crosstab report, you want to transpose the Times Purchased column to the header row as shown in Figure 2. The column becomes the row, as if the column were rotated 90 degrees anti-clockwise to become the header row. This figurative rotation needs to have a pivot point and in this case the pivot point happens to be the count(state_code) expression.
Figure 2 Pivoted representation
This expression needs to be in the syntax of the query:
...
pivot
(
count(state_code)
for state_code in ('NY','CT','NJ','FL','MO')
)
...
The second line, "for state_code ...," limits the query to only those values. This line is necessary, so unfortunately you have to know the possible values beforehand. This restriction is relaxed in the XML format of the query, described later in this article.
Note the header rows in the output:
. TIMES_PURCHASED 'NY' 'CT' 'NJ' 'FL' 'MO'
--------------- ---------- ---------- ---------- ---------- ----------
The column headers are the data from the table itself: the state codes. The abbreviations may be self explanatory but suppose you want to display the state names instead of abbreviations, ("Connecticut" instead of "CT")? In that case you have to make a little adjustment in the query, in the FOR clause as shown below:
select * from (
select times_purchased as "Puchase Frequency", state_code
from customers t
)
pivot
(
count(state_code)
for state_code in ('NY' as "New York",'CT' "Connecticut",'NJ' "New Jersey",'FL' "Florida",'MO' as "Missouri")
)
order by 1
/
Puchase Frequency New York Connecticut New Jersey Florida Missouri
----------------- ---------- ----------- ---------- ---------- ----------
0 16601 90 0 0 0
1 33048 165 0 0 0
2 33151 179 0 0 0
3 32978 173 0 0 0
4 33109 173 0 1 0
...
and so on ...
The FOR clause can have aliases for the values there, which will become the column headers.
For matter there is anti-matter; for pivot there should be "unpivot," right?
Humor aside, there is a genuine need for the reverse of the pivot operation. Suppose you have a spreadsheet that shows the crosstab report shown below:
Purchase Frequency | New York | Connecticut | New Jersey | Florida | Missouri |
---|---|---|---|---|---|
0 | 12 | 11 | 1 | 0 | 0 |
1 | 900 | 14 | 22 | 98 | 78 |
2 | 866 | 78 | 13 | 3 | 9 |
... | . |
Now you want to load the data into a relational table called CUSTOMERS:
SQL> desc customers
Name Null? Type
----------------------------------------- -------- ---------------------------
CUST_ID NUMBER(10)
CUST_NAME VARCHAR2(20)
STATE_CODE VARCHAR2(2)
TIMES_PURCHASED NUMBER(3)
The spreadsheet data must be de-normalized to a relational format and then stored. Of course, you can write a complex SQL*:Loader or SQL script using DECODE to load the data into CUSTOMERS table. Or you can use the reverse operation of pivot
—UNPIVOT—to break up the columns to become rows, as is possible in Oracle Database 11g.
It might be easier to demonstrate this via an example. Let's create a crosstab table first, using the pivot
operation:
1 create table cust_matrix
2 as
3 select * from (
4 select times_purchased as "Puchase Frequency", state_code
5 from customers t
6 )
7 pivot
8 (
9 count(state_code)
10 for state_code in ('NY' as "New York",'CT' "Conn",'NJ' "New Jersey",'FL' "Florida",'MO' as "Missouri")
11 )
12* order by 1
You can check how the data is stored in the table:
SQL> select * from cust_matrix
2 /
Puchase Frequency New York Conn New Jersey Florida Missouri
----------------- ---------- ---------- ---------- ---------- ---------
1 33048 165 0 0 0
2 33151 179 0 0 0
3 32978 173 0 0 0
4 33109 173 0 1 0
... and so on ...
This is how the data is stored in the spreadsheet: Each state is a column in the table ("New York", "Conn", and so on).
SQL> desc cust_matrix
Name Null? Type
----------------------------------------- -------- ---------------------------
Puchase Frequency NUMBER(3)
New York NUMBER
Conn NUMBER
New Jersey NUMBER
Florida NUMBER
Missouri NUMBER
You need to break up the table so that rows will show only the state code and the counts for that state. This can be done by the unpivot
operation shown below:
select *
from cust_matrix
unpivot
(
state_counts
for state_code in ("New York","Conn","New Jersey","Florida","Missouri")
)
order by "Puchase Frequency", state_code
/
Here is the output:
Puchase Frequency STATE_CODE STATE_COUNTS
----------------- ---------- ------------
1 Conn 165
1 Florida 0
1 Missouri 0
1 New Jersey 0
1 New York 33048
2 Conn 179
2 Florida 0
2 Missouri 0
...
and so on ...
Note how the each column name has become a value in the STATE_CODE column. How did Oracle know that state_code is a column name? It knew that from the following clause in the query:
for state_code in ("New York","Conn","New Jersey","Florida","Missouri")
Here you specified that the values "New York", "Conn", and so on are values of a new column you want to be unpivoted on, called state_code. Look at part of the original data:
Purchase Frequency New York Conn New Jersey Florida Missouri
----------------- ---------- ---------- ---------- ---------- - ---------
1 33048 165 0 0 0
As the column "New York" suddenly became a value in a row, how would you show the value 33048, under which column? That question is answered by the clause just above the for clause inside the unpivot
operator in the above query. As you specified state_counts, that is the name of the new column created in the resultant output.
Unpivot
may be the opposite action of pivot
but don't assume that former can reverse what latter has done. For instance, in the above example, you created a new table, CUST_MATRIX, using a pivot
noperation on the table CUSTOMERS. Later you used unpivot
on the table CUST_MATRIX but that didn't get back the details of the original table CUSTOMERS. Instead the crosstab report was displayed in a different way for you to load into a relational table. So unpivot
is not for undoing what pivot
did, a fact you should carefully consider before creating a pivoted table and then dropping the original one.
Some of the very interesting uses of unpivot
go beyond the usual powerful data manipulation such as the example shown earlier. Oracle ACE Director Lucas Jellema of Amis Technologies has shown how you can generate rows of specific data for testing purposes. Here I will use a slightly modified form of his original code to generate vowels of the English alphabet:
select value
from
(
(
select
'a' v1,
'e' v2,
'i' v3,
'o' v4,
'u' v5
from dual
)
unpivot
(
value
for value_type in
(v1,v2,v3,v4,v5)
)
)
/
Here is the output:
V
-
a
e
i
o
u
This model can be extended to cover any type of row generator. Thank you Lucas for showing us this nifty trick.
In the above example, note how you had to specify the valid state_codes:
for state_code in ('NY','CT','NJ','FL','MO')
This requirement assumes that you know what values are present in the state_code column. If you don't know what values are available, how would you construct a query?
Well, there is another clause in the pivot
operation, XML, that allows you to create the pivoted output as XML where you can specify a special clause, ANY, instead of literal values. Here is the example:
select * from (
select times_purchased as "Purchase Frequency", state_code
from customers t
)
pivot xml
(
count(state_code)
for state_code in (any)
)
order by 1
/
The output comes back as CLOB so make sure the LONGSIZE is set to a large value before running the query.
SQL> set long 99999
There are two distinct differences in this query (shown in bold) compared to the original pivot
operation. First, you specified a clause, pivot xml, instead of just pivot
. It creates the output in XML. Second, the for clause shows for state_code in (any) instead of a long list of state_code values. The XML notation allows you to use the ANY keyword and you don't need to enter the state_code values. Here is the output:
Purchase Frequency STATE_CODE_XML
------------------ --------------------------------------------------
1 <PivotSet><item><column name = "STATE_CODE">CT</co
lumn><column name = "COUNT(STATE_CODE)">165</colum
n></item><item><column name = "STATE_CODE">NY</col
umn><column name = "COUNT(STATE_CODE)">33048</colu
mn></item></PivotSet>
2 <PivotSet><item><column name = "STATE_CODE">CT</co
lumn><column name = "COUNT(STATE_CODE)">179</colum
n></item><item><column name = "STATE_CODE">NY</col
umn><column name = "COUNT(STATE_CODE)">33151</colu
mn></item></PivotSet>
... and so on ...
As you can see, the column STATE_CODE_XML is XMLTYPE, where the root element is <PivotSet>. Each value is represented as a name-value element pair. You can use the output in any XML parser to produce more useful output.
In addition to the ANY clause, you can write a subquery. Suppose you have a list of preferred states and you want to select the rows for those states only. You placed the preferred states in a new table called preferred_states:
SQL> create table preferred_states
2 (
3 state_code varchar2(2)
4 )
5 /
Table created.
SQL> insert into preferred_states values ('FL')
2> /
1 row created.
SQL> commit;
Commit complete.
Now the pivot
operation looks like:
select * from (
select times_purchased as "Puchase Frequency", state_code
from customers t
)
pivot xml
(
count(state_code)
for state_code in (select state_code from preferred_states)
)
order by 1
/
for state_code in (select distinct state_code from customers)
The subquery must return distinct values; otherwise the query will fail. That's why we have specified DISTINCT clause above.
Pivot
adds a very important and practical functionality to the SQL language. Instead of writing convoluted non-intuitive code with a lot of decode functions, you can use the pivot function to create a crosstab report on any relational table. Similarly, you can convert any crosstab report to be stored as a regular relational table using the unpivot
operation. Pivot
can produce the output in regular text or XML. In the latter case, you don't need to specify the domain of values the pivot operation needs to search.
For more information on pivot and unpivot operations, refer to the Oracle Database 11g SQL Language Reference.
Back to Series TOC