Oracle Database 11g:

The Top Features for DBAs and Developers

by Arup Nanda, Oracle ACE Director

Pivot and Unpivot

Present information in a spreadsheet-type crosstab report from any relational table using simple SQL, and store any data from a crosstab table to a relational table.

See Series TOC

Pivot

As you know, relational tables are, well, tabular—that is, they are presented in a column-value pair. Consider the case of a table named CUSTOMERS.




SQL> desc customers
 Name                                      Null?    Type
 -----------------------------------------                     --------     ---------------------------
 CUST_ID                                            NUMBER(10)
 CUST_NAME                                          VARCHAR2(20)
 STATE_CODE                                         VARCHAR2(2)
 TIMES_PURCHASED                                    NUMBER(3)

When this table is selected:



select cust_id, state_code, times_purchased
from customers
order by cust_id;

The output is:





CUST_ID STATE_CODE TIMES_PURCHASED
-------  ----------               ---------------
      1 CT                       1
      2 NY                      10
      3 NJ                       2
      4 NY                       4
...  
                              and so on ...
                            

Note how the data is represented as rows of values: For each customer, the record shows the customer's home state and how many times the customer purchased something from the store. As the customer purchases more items from the store, the column times_purchased is updated.

Now consider a case where you want to have a report of the purchase frequency each state - that is, how many customers bought something only once, twice, thrice and so on, from each state. In regular SQL, you can issue the following statement:



select state_code, times_purchased, count(1) cnt
from customers
group by state_code, times_purchased;

Here is the output:





ST TIMES_PURCHASED        CNT
--         ---------------      ----------
CT               0         90
CT               1        165
CT               2        179
CT               3        173
CT               4        173
CT               5        152
...  
                              
                              and so on ...
                            
                          

This is the information you want but it's a little hard to read. A better way to represent the same data may be through the use of crosstab reports, in which you can organized the data vertically and states horizontally, just like a spreadsheet:





Times_purchased
             CT           NY         NJ      ...  
                              and so on ...

1             0            1          0      ...
2            23          119         37      ...
3            17           45          1      ...
...  
                              and so on ... 
                            

Prior to Oracle Database 11g, you would do that via some sort of a decode function for each value and write each distinct value as a separate column. The technique is quite nonintuitive however.


Fortunately, you now have a great new feature called PIVOT for presenting any query in the crosstab format using a new operator, appropriately named pivot . Here is how you write the query:




select * from (
   select times_purchased, state_code
   from customers t
)
pivot 
(
   count(state_code)
   for state_code in ('NY','CT','NJ','FL','MO')
)
order by times_purchased
/

Here is the output:




. TIMES_PURCHASED       'NY'       'CT'       'NJ'       'FL'       'MO'
       ---------------      ----------      ----------      ----------       ----------      ----------
              0      16601         90          0          0          0
              1      33048        165          0          0          0
              2      33151        179          0          0          0
              3      32978        173          0          0          0
              4      33109        173          0          1          0
... and so on ...

This shows the power of the pivot operator. The state_codes are presented on the header row, instead of a column. Pictorially, here is the how the traditional tabular format looks:

 Traditional tabular representation

Figure 1 Traditional tabular representation

In a crosstab report, you want to transpose the Times Purchased column to the header row as shown in Figure 2. The column becomes the row, as if the column were rotated 90 degrees anti-clockwise to become the header row. This figurative rotation needs to have a pivot point and in this case the pivot point happens to be the count(state_code) expression.

 Pivoted representation

Figure 2 Pivoted representation

This expression needs to be in the syntax of the query:





...
pivot 
(
   count(state_code)
   for state_code in ('NY','CT','NJ','FL','MO')
)
...

The second line, "for state_code ...," limits the query to only those values. This line is necessary, so unfortunately you have to know the possible values beforehand. This restriction is relaxed in the XML format of the query, described later in this article.

Note the header rows in the output:




. TIMES_PURCHASED       'NY'       'CT'       'NJ'       'FL'       'MO'
        ---------------      ----------      ----------      ----------      ----------       ----------

The column headers are the data from the table itself: the state codes. The abbreviations may be self explanatory but suppose you want to display the state names instead of abbreviations, ("Connecticut" instead of "CT")? In that case you have to make a little adjustment in the query, in the FOR clause as shown below:





select * from (
   select times_purchased as "Puchase Frequency", state_code
   from customers t
)
pivot 
(
   count(state_code)
   for state_code in ('NY' as "New York",'CT' "Connecticut",'NJ' "New Jersey",'FL' "Florida",'MO' as "Missouri")
)
order by 1
/



Puchase Frequency   New York Connecticut New Jersey    Florida   Missouri
        -----------------      ----------     -----------       ----------      ----------      ----------
                0      16601         90           0          0          0
                1      33048        165           0          0          0
                2      33151        179           0          0          0
                3      32978        173           0          0          0
                4      33109        173           0          1          0
...  
                              and so on ...
                            
                          

The FOR clause can have aliases for the values there, which will become the column headers.

Unpivot

For matter there is anti-matter; for pivot there should be "unpivot," right?

Humor aside, there is a genuine need for the reverse of the pivot operation. Suppose you have a spreadsheet that shows the crosstab report shown below:

Purchase Frequency New York Connecticut New Jersey Florida Missouri
0 12 11 1 0 0
1 900 14 22 98 78
2 866 78 13 3 9
... .        

Now you want to load the data into a relational table called CUSTOMERS:





SQL> desc customers
 Name                                      Null?    Type
 -----------------------------------------                     --------     ---------------------------
 CUST_ID                                            NUMBER(10)
 CUST_NAME                                          VARCHAR2(20)
 STATE_CODE                                         VARCHAR2(2)
 TIMES_PURCHASED                                    NUMBER(3)

The spreadsheet data must be de-normalized to a relational format and then stored. Of course, you can write a complex SQL*:Loader or SQL script using DECODE to load the data into CUSTOMERS table. Or you can use the reverse operation of pivot —UNPIVOT—to break up the columns to become rows, as is possible in Oracle Database 11g.

It might be easier to demonstrate this via an example. Let's create a crosstab table first, using the pivot operation:




  1  create table cust_matrix
  2  as
  3  select * from (
  4     select times_purchased as "Puchase Frequency", state_code
  5     from customers t
  6  )
  7  pivot
  8  (
  9     count(state_code)
 10     for state_code in ('NY' as "New York",'CT' "Conn",'NJ' "New Jersey",'FL' "Florida",'MO' as "Missouri")
 11  ) 
 12* order by 1

You can check how the data is stored in the table:





SQL> select * from cust_matrix
  2  /

Puchase Frequency   New York       Conn New Jersey    Florida   Missouri
        -----------------      ----------      ----------      ----------      ----------      ---------
                1      33048        165          0          0          0
                2      33151        179          0          0          0
                3      32978        173          0          0          0
                4      33109        173          0          1          0
... and so on ...

This is how the data is stored in the spreadsheet: Each state is a column in the table ("New York", "Conn", and so on).





SQL> desc cust_matrix
 Name                                      Null?    Type
 -----------------------------------------                     --------     ---------------------------
 Puchase Frequency                                  NUMBER(3)
 New York                                           NUMBER
 Conn                                               NUMBER
 New Jersey                                         NUMBER
 Florida                                            NUMBER
 Missouri                                           NUMBER

You need to break up the table so that rows will show only the state code and the counts for that state. This can be done by the unpivot operation shown below:





select *
  from cust_matrix
unpivot
(
  state_counts
    for state_code in ("New York","Conn","New Jersey","Florida","Missouri")
)
order by "Puchase Frequency", state_code
/

Here is the output:





Puchase Frequency STATE_CODE STATE_COUNTS
        ----------------- ----------            ------------
                1 Conn                165
                1 Florida               0
                1 Missouri              0
                1 New Jersey            0
                1 New York          33048
                2 Conn                179
                2 Florida               0
                2 Missouri              0
...  
                              
                              and so on ...
                            
                          

Note how the each column name has become a value in the STATE_CODE column. How did Oracle know that state_code is a column name? It knew that from the following clause in the query:

for state_code in ("New York","Conn","New Jersey","Florida","Missouri")

Here you specified that the values "New York", "Conn", and so on are values of a new column you want to be unpivoted on, called state_code. Look at part of the original data:





Purchase Frequency   New York       Conn New Jersey    Florida   Missouri
        -----------------      ----------      ----------      ----------      ---------- -     ---------
                1      33048        165          0          0          0

As the column "New York" suddenly became a value in a row, how would you show the value 33048, under which column? That question is answered by the clause just above the for clause inside the unpivot operator in the above query. As you specified state_counts, that is the name of the new column created in the resultant output.

Unpivot may be the opposite action of pivot but don't assume that former can reverse what latter has done. For instance, in the above example, you created a new table, CUST_MATRIX, using a pivotnoperation on the table CUSTOMERS. Later you used unpivot on the table CUST_MATRIX but that didn't get back the details of the original table CUSTOMERS. Instead the crosstab report was displayed in a different way for you to load into a relational table. So unpivot is not for undoing what pivot did, a fact you should carefully consider before creating a pivoted table and then dropping the original one.

Some of the very interesting uses of unpivot go beyond the usual powerful data manipulation such as the example shown earlier. Oracle ACE Director Lucas Jellema of Amis Technologies has shown how you can generate rows of specific data for testing purposes. Here I will use a slightly modified form of his original code to generate vowels of the English alphabet:




select value
from
(
    (
        select
            'a' v1,
            'e' v2,
            'i' v3,
            'o' v4,
            'u' v5
        from dual
    )
    unpivot
    (
        value
        for value_type in
            (v1,v2,v3,v4,v5)
    )
)
/

Here is the output:





V
-
a
e
i
o
u

This model can be extended to cover any type of row generator. Thank you Lucas for showing us this nifty trick.

XML Type

In the above example, note how you had to specify the valid state_codes:




for state_code in ('NY','CT','NJ','FL','MO')

This requirement assumes that you know what values are present in the state_code column. If you don't know what values are available, how would you construct a query?

Well, there is another clause in the pivot operation, XML, that allows you to create the pivoted output as XML where you can specify a special clause, ANY, instead of literal values. Here is the example:




select * from (
   select times_purchased as "Purchase Frequency", state_code
   from customers t
)
                             pivot xml
(
   count(state_code)
    
                              for state_code in (any)
)
order by 1
/
                            

The output comes back as CLOB so make sure the LONGSIZE is set to a large value before running the query.

SQL> set long 99999

There are two distinct differences in this query (shown in bold) compared to the original pivot operation. First, you specified a clause, pivot xml, instead of just pivot. It creates the output in XML. Second, the for clause shows for state_code in (any) instead of a long list of state_code values. The XML notation allows you to use the ANY keyword and you don't need to enter the state_code values. Here is the output:




Purchase Frequency STATE_CODE_XML
         ------------------ --------------------------------------------------
                 1 <PivotSet><item><column name = "STATE_CODE">CT</co
                   lumn><column name = "COUNT(STATE_CODE)">165</colum
                   n></item><item><column name = "STATE_CODE">NY</col
                   umn><column name = "COUNT(STATE_CODE)">33048</colu
                   mn></item></PivotSet>

                 2 <PivotSet><item><column name = "STATE_CODE">CT</co
                   lumn><column name = "COUNT(STATE_CODE)">179</colum
                   n></item><item><column name = "STATE_CODE">NY</col
                   umn><column name = "COUNT(STATE_CODE)">33151</colu
                   mn></item></PivotSet>
 
... and so on ...

As you can see, the column STATE_CODE_XML is XMLTYPE, where the root element is <PivotSet>. Each value is represented as a name-value element pair. You can use the output in any XML parser to produce more useful output.

In addition to the ANY clause, you can write a subquery. Suppose you have a list of preferred states and you want to select the rows for those states only. You placed the preferred states in a new table called preferred_states:




SQL> create table preferred_states
  2  (
  3     state_code varchar2(2)
  4  )
  5  /
 
Table created.
 
SQL> insert into preferred_states values ('FL')
  2> /
 
1 row created.
 
SQL> commit;
 
Commit complete.

Now the pivot operation looks like:





select * from (
   select times_purchased as "Puchase Frequency", state_code
   from customers t
)
pivot xml
(
   count(state_code)
   for state_code in (select state_code from preferred_states)
)
order by 1
/

The subquery in the for clause can be anything you want. For instance, if you want to select all the records without any restriction on any preferred states, you can use the following as a for clause:



for state_code in (select distinct state_code from customers)

The subquery must return distinct values; otherwise the query will fail. That's why we have specified DISTINCT clause above.

Conclusion

Pivot adds a very important and practical functionality to the SQL language. Instead of writing convoluted non-intuitive code with a lot of decode functions, you can use the pivot function to create a crosstab report on any relational table. Similarly, you can convert any crosstab report to be stored as a regular relational table using the unpivot operation. Pivot can produce the output in regular text or XML. In the latter case, you don't need to specify the domain of values the pivot operation needs to search.

For more information on pivot and unpivot operations, refer to the Oracle Database 11g SQL Language Reference.

Back to Series TOC