As Published In

Oracle Magazine
July/August 2005
TECHNOLOGY: SQL

Nulls: Nothing to Worry About
By Lex de Haan and Jonathan Gennick

Avoid problems with three-valued logic.

Called null values in the ISO SQL standard, nulls are anything but values. Nulls are markers indicating the complete lack of a value. They lead to three-valued logic, which is confusing to work with, and that confusion often leads the unwary down the path of writing SELECT statements that return wrong results. This article highlights some of the pitfalls you'll encounter and gives advice for avoiding them.

Nulls in Scalar Expressions

Typically, the result of any scalar expression involving a null will itself be a null. The first query in Listing 2, against the data in Listing 1, demonstrates this by generating a report showing the impact of raising all salaries by US$1,000. You can see right away in Listing 2 that neither Adams nor Morle have any value at all for their new salary.

Code Listing 1: Our example schema

TABLE: DEPT_M

DEPTNO       DNAME         LOC
_______      _______       _______________
10           HQ            UTRECHT
20           SALES         MUNISING
30           MANUFACTURING NOVOSIBIRSK

TABLE: EMP_M

EMPNO       ENAME	     JOB	     MGR	SAL	 COMM	  DEPTNO
_______     ____________     ____________    _____      _____    ______   ________                 
100	    NORGAARD	     PRESIDENT		        5000		  10
122	    LEWIS	     SALESREP	      120	1100
199	    GENNICK			                2200		  10
111	    DE HAAN	     CLERK	      110	2000
112	    MILLSAP	     SALESREP	      110	1250	 1400	  20
110	    ADAMS	     MANAGER	      100	1700	          20
120	    KOLK	     MANAGER	      100	2450		  10
113	    MCDONALD	     SALESREP	      110	1500		  20
121	    WOOD	     CLERK	      120	1300		  10
130	    MORLE	     CLERK	      100			  10

From the perspective of the database server, nulls can have only one context- and datatype-independent meaning: "information missing." Any further interpretation of nulls might be very human and intuitive, but the database server treats all nulls the same way, regardless of where they come from. The database server cannot add US$1,000 to null and must simply return null as the result.

Yet humans do need to deal with nulls, and they have the ability to ask questions that the database server cannot. What does it mean to have a null salary? Does it mean that salary doesn't apply? Or does it mean that an employee has "no salary" in the sense that that person's salary is US$0? Or does it mean that a value would be applicable, but we simply don't know that value? Or could it mean any of the foregoing, depending on which employee we are talking about? Although it is sometimes necessary, you can begin to see that human interpretation of nulls can be quite dangerous.

Perhaps most important is the question of what it means from a business standpoint to increase a null salary by US$1,000. It's not enough to deal with nulls in a technical vacuum. You must step back and ask the right business questions. Only after understanding both the underlying data model and the business intent of the query at hand are you ready to tackle the query's handling of nulls.

One approach to handling nulls in scalar expressions is to substitute a real value anytime a null might occur. To this end, COALESCE can be very helpful. If you determine that in addition to raising all salaries by US$1,000, the business also wants to grant US$1,000 salaries to those currently without a salary, you can use the COALESCE function to treat nulls as zero. The second query in Listing 2 passes two arguments, the value from the SAL column and then a zero, to COALESCE. The function returns the first non-null argument as its result.

Code Listing 2: Null in, null out, and the result of COALESCE

SELECT EMPNO, ENAME, SAL, SAL + 1000 FROM EMP_M;

EMPNO	   ENAME	SAL	SAL+1000
_______    _________    _____   ___________         
112	   MILLSAP	1250	2250
110	   ADAMS
120	   KOLK	        2450	3450
130	   MORLE
  ...

SELECT EMPNO, ENAME, SAL, 
        COALESCE(SAL,0) + 1000 FROM EMP_M;

EMPNO	  ENAME	      SAL    	COALESCE(SAL,0)+1000
_______   _________   _____     _________________________
112	  MILLSAP     1250	2250
110	  ADAMS		        1000
120	  KOLK	      2450	3450

Oracle Database supports several functions that are similar to COALESCE. These include NVL2, NULLIF, and NVL. (Take time to read about these functions in the Oracle SQL Reference manual.) We recommend COALESCE over NVL, because COALESCE can handle more than just two arguments and it's part of the SQL standard. When COALESCE isn't enough, you may be able to find refuge in CASE expressions, which are also covered in Oracle SQL Reference.

Nulls in Boolean Expressions

Nulls make themselves felt in particularly subtle ways in Boolean expressions, such as those you might write for the WHERE clause of a query. Boolean expressions normally result in TRUE or FALSE, but nulls introduce a third possible result of Boolean expressions: UNKNOWN. Note that NULL is not the same as UNKNOWN:

  • SAL + NULL results in NULL. (This is a scalar expression.)
  • SAL < NULL results in UNKNOWN. (This is a Boolean expression.)

Listing 3 shows the ultimate consequence of three-valued logic. Even though COMM is being compared with itself, the database treats any nulls in a context-independent manner: Any comparison to null results in UNKNOWN, and queries return only those rows for which the WHERE clause evaluates to TRUE.

Code Listing 3: WHERE COMM = COMM is not equal

SELECT * FROM EMP_M WHERE COMM = COMM;

EMPNO	  ENAME	       JOB	     MGR     SAL     COMM	DEPTNO
_______   __________   __________    _____   _____   ______     ________ 
112	  MILLSAP      SALESREP	     110     1250    1400	20
110	  ADAMS	       MANAGER	     100	     1700	20

You can often use IS NULL or IS NOT NULL in Boolean expressions to avoid UNKNOWN results. Consider the problem of listing employees with commissions less than US$1,500. You could begin by writing:

SELECT * FROM EMP_M
WHERE COMM < 1500;

And you'd soon discover (we hope) that employees such as Norgaard and Lewis, who have null commissions, would be omitted from the list. Assuming that you wanted to interpret a null commission as "no commission," you could broaden your WHERE clause by adding an IS NULL condition:

SELECT * FROM EMP_M
WHERE (COMM < 1500)
   OR (COMM IS NULL);

Be careful here! Does a null commission mean "no commission," or might it mean "commission unknown"? The answer depends on your application, and there might not be a clear answer at all. Don't automatically add IS NULL predicates on nullable columns to your queries. Doing so will lead to errors just as surely as failing to think about the possibility of nulls to begin with will do so. Whether to include the IS NULL condition in our example is actually a business decision, not a technical one.

When writing WHERE clauses, it's common to link several predicates by using the operators AND, OR, and NOT, as we've just done. The truth tables in Figure 1 show how these operators handle different combinations of TRUE, FALSE, and UNKNOWN operands.

figure 1

Nulls in CHECK Constraints

When you are evaluating a WHERE clause, UNKNOWN leads to the same end result as FALSE—the row is rejected. Yet in a CHECK constraint, UNKNOWN leads to the same end result as TRUE—the row is accepted. This is because constraints raise violations only if their Boolean expressions evaluate to FALSE. It's why the following constraint definition allows nulls in the DEPTNO column, although it might suggest otherwise:

CHECK (DEPTNO 
   IN (10, 20, 30))

Here is a good way to think about all this. The action of WHERE and HAVING clauses is to pass those rows for which expressions evaluate to TRUE. The action of a CHECK constraint is to reject rows for which expressions are FALSE. In all cases, no action, neither pass nor reject, is taken in the UNKNOWN case.

Nulls and Joins

Listing 4 demonstrates that outer joins represent one situation in which the database engine will generate nulls for you, on the fly, even if your database doesn't contain a single null to begin with. The row for department 30 is generated by the outer-join operation, and the employee columns in that row are initialized to null.

Code Listing 4: Outer joins generate nulls

SELECT 	E.EMPNO, E.ENAME, E.DEPTNO, D.DEPTNO, D.DNAME
FROM EMP_M E RIGHT OUTER JOIN DEPT_M D ON E.DEPTNO = D.DEPTNO;

EMPNO	  ENAME	        DEPTNO	      DEPTNO	 DNAME
_______   ___________   ________      ________   __________________   
100	  NORGAARD	10	      10	 HQ
...
130	  MORLE	10	10	                 HQ
                          	      30	 MANUFACTURING

Another subtle issue with nulls and outer joins is that it matters from which table you return a join column. Note that one of the two DEPTNO values for the last row in Listing 4 is null. The null DEPTNO is in the employee table's (EMP_M) DEPTNO column, whereas the non-null DEPTNO value comes from the department table (DEPT_M). When writing an outer join, give careful consideration to the table from which you retrieve join columns.

Nulls in Summarized Data

Scalar expressions yield null if any operand is null, but nulls are ignored by aggregate functions. This behavior is specified by the SQL standard, but it can still lead to some very surprising and unintuitive query results. Look at Listing 5. Apparently, if you want to derive the sum of two or more columns containing numeric values, there is a difference between first adding them up horizontally and then vertically, and adding them up vertically first.

Code Listing 5: SUM(A+B) is not the same as SUM(A)+SUM(B)

SELECT SUM(SAL+COMM), SUM(SAL)+SUM(COMM) FROM EMP_M;

SUM(SAL+COMM)	         SUM(SAL)+SUM(COMM)
__________________       ________________________
2650	                 19900

Why the difference? It's because the result of SAL + COMM is non-null only for those rows in which both SAL and COMM are non-null. Thus, only those rows contribute to the result of SUM(SAL+COMM). The result of SUM(SAL)+SUM(COMM), on the other hand, manages to include all non-null values from both columns.

We wish we could provide a nice, neat, tidy bit of advice for dealing with nulls in summarized data. The best we can tell you, however, is to carefully think through the possibility of nulls and to make a conscious decision about the results you want in your summary.

"NOT IN" versus "NOT EXISTS"

Compare the two queries (and their results) in Listing 6. In each query, we are trying to retrieve all employees without subordinates. Obviously, the results are different—so it is impossible that they are both correct. An interesting question is: Which query is correct? And which result did you expect, based on our EMP_M demo table? You probably expected the second result, and therefore you probably would point at the first query as being wrong, right? Perhaps you have an idea of how to "fix" the first query to make it return the "correct" result.

Code Listing 6: Who has no subordinates?

SELECT E1.ENAME FROM EMP_M E1
WHERE E1.EMPNO NOT IN
      (SELECT E2.MGR FROM EMP_M E2);

No rows selected.

SELECT E1.ENAME FROM EMP_M E1
WHERE NOT EXISTS
      (SELECT E2.* FROM EMP_M E2 
       WHERE E2.MGR = E1.EMPNO);

ENAME
____________ 
DE HAAN
WOOD
MILLSAP
LEWIS
MCDONALD
GENNICK
MORLE

But wait a minute! We have a convincing argument to show that the first query is correct and the second one is wrong! Lewis, Gennick, De Haan, Millsap, McDonald, Wood, Morle—any one of them could be the manager of Norgaard. Thus, we cannot truly be certain that any of them have no subordinates.

What is subtle about Listing 6 is that the programmer may not have consciously intended to apply any sort of business interpretation to nulls. NOT IN and NOT EXISTS are often thought of as being interchangeable, but they really aren't—not quite—not when it comes to nulls. The intermediate result of the subquery in the first example in Listing 6 contains a null. The database engine cannot be sure that E1.EMPNO is not equal to null, and thus no rows evaluate to TRUE. The second example tests for the existence of rows having a specific MGR value. Rows where MGR is null aren't even considered. When writing NOT IN conditions, always be sure you take the time to think about the X NOT IN (...,NULL,...) case. When writing NOT EXISTS, consider whether the business really requires the results you'd get from the NOT IN version of the query. And remember, choosing the path to go down is a business decision. Given the scenario posed in Listing 6, we would go back to our business client to discuss the ramifications of the two different solutions.

Watch for the Empty Set!

As soon as you use aggregate functions in your SQL, you must be aware that empty sets might come into play. Listing 7 shows how aggregate functions react to empty sets. Apparently, COUNT returns zero whereas AVG, SUM, MAX, and MIN return null. This behavior does make a certain amount of sense. If you have no values to count, it's fair to say that you have zero values, whereas you can't really come up with, say, a maximum value without at least one value from which to choose. We can make a reasonable argument that SUM should return zero instead of null, that the sum of no values is zero, but Oracle's implementation of the behavior we describe here is fully compliant with the SQL standard.

Code Listing 7: Aggregate functions and the empty set

SELECT COUNT(EMPNO),AVG(EMPNO),SUM(EMPNO),MAX(EMPNO),MIN(EMPNO)
FROM EMP_M
WHERE 1 = 2;

COUNT(EMPNO)	     AVG(EMPNO)	        SUM(EMPNO)	  MAX(EMPNO)	    MIN(EMPNO)
________________     _____________      _____________     _____________     _____________ 
0

Listing 8 shows two solutions to the problem of listing employees who are paid more than any sales rep in department 10. How is it possible that the two results are different? The first query retrieves (in the subquery) the maximum salary of all sales reps from department 10 and then compares all employee salaries, one by one, with that maximum salary. The second query does almost the same thing, the only difference being that each employee salary from the outer query is now compared against all salaries from department 10, as opposed to just the highest salary.

Code Listing 8: Who has a higher salary?

SELECT E1.ENAME FROM EMP_M E1
  WHERE E1.SAL >
        (SELECT MAX(E2.SAL) FROM EMP_M E2
         WHERE E2.DEPTNO = 10 AND E2.JOB = 'SALESREP');

No rows selected.

SELECT E1.ENAME FROM EMP_M E1
  WHERE E1.SAL > ALL
        (SELECT E2.SAL FROM EMP_M E2
         WHERE E2.DEPTNO = 10 AND E2.JOB = 'SALESREP');

ENAME
------------------
NORGAARD
LEWIS
ADAMS
...
MORLE

10 rows select

Next Steps

READ more about
LogMiner
DBMS_LOGMNR package

The empty set plays an important role here, because if you give our EMP_M table a closer look, you will see that department 10 has no sales reps. Therefore, the MAX function of the first query returns a null and the WHERE clause of the main query results in the value UNKNOWN for all employees. On the other hand, the second query returns all employees—because any salary is greater than all salaries in an empty set. Note that even Adams and Morle show up in the result, even though they both have a null salary. We must hasten to say that this is not a bug but fully expected behavior in accordance with the SQL standard. The lesson to take away here is to always, always ask yourself the question: "What if the aggregate function is applied against an empty set?"

Watch out for Nothing

Be vigilant for possible nulls when writing and processing your SQL statements. Remember the tools available to help you work with nulls, including CASE, COALESCE, IS NULL, IS NOT NULL, NULLIF, NVL, and NVL2. Be aware of the third value when working with nulls in Boolean expressions—UNKNOWN—and remember that your business can look at nulls in different ways.


Lex de Haan (lex.de.haan@naturaljoin.nl) is an author and lecturer. He studied applied mathematics at the Technical University in Delft, The Netherlands; worked for Oracle from 1990 to 2004; and is a member of the OakTable Network. Jonathan Gennick (Jonathan@Gennick.com) is an experienced Oracle professional who enjoys writing on SQL topics. He wrote the SQL Pocket Guide and the Oracle SQL*Plus Pocket Reference, both from O'Reilly Media.

Send us your comments

E-mail this page
Printer View Printer View
Oracle Is The Information Company About Oracle | Oracle RSS Feeds | Careers | Contact Us | Site Maps | Legal Notices | Terms of Use | Privacy