Set operators and NULL

NULL is treated differently by the set operators UNION, EXCEPT, and INTERSECT than it is in search conditions. This difference is one of the main reasons to use set operators.

When comparing rows, set operators treat NULL values as equal to each other. In contrast, when NULL is compared to NULL in a search condition the result is unknown (not true).

One result of this difference is that the number of rows in the result set for query-1 EXCEPT ALL query-2 is always the difference in the number of rows in the result sets of the individual queries.

For example, consider two tables T1 and T2, each with the following columns:

col1 INT,
col2 CHAR(1)

The tables and data are set up as follows:

CREATE TABLE T1 (col1 INT, col2 CHAR(1));
CREATE TABLE T2 (col1 INT, col2 CHAR(1));
INSERT INTO T1 (col1, col2) VALUES(1, 'a');
INSERT INTO T1 (col1, col2) VALUES(2, 'b');
INSERT INTO T1 (col1) VALUES(3);
INSERT INTO T1 (col1) VALUES(3);
INSERT INTO T1 (col1) VALUES(4);
INSERT INTO T1 (col1) VALUES(4);
INSERT INTO T2 (col1, col2) VALUES(1, 'a');
INSERT INTO T2 (col1, col2) VALUES(2, 'x');
INSERT INTO T2 (col1) VALUES(3);

The data in the tables is as follows:

Table T1.

col1	col2
1	a
2	b
3	(NULL)
3	(NULL)
4	(NULL)
4	(NULL)

Table T2

col1	col2
1	a
2	x
3	(NULL)

One query that asks for rows in T1 that also appear in T2 is as follows:

SELECT T1.col1, T1.col2
   FROM T1 JOIN T2
   ON T1.col1 = T2.col1
   AND T1.col2 = T2.col2;

T1.col1	T1.col2
1	a

The row ( 3, NULL ) does not appear in the result set, as the comparison between NULL and NULL is not true. In contrast, approaching the problem using the INTERSECT operator includes a row with NULL:

SELECT col1, col2
   FROM T1
   INTERSECT
      SELECT col1, col2
         FROM T2;

col1	col2
1	a
3	(NULL)

The following query uses search conditions to list rows in T1 that do not appear in T2:

SELECT col1, col2
   FROM T1
   WHERE col1 NOT IN (
       SELECT col1
         FROM T2
         WHERE T1.col2 = T2.col2 )
   OR col2 NOT IN (
       SELECT col2
         FROM T2
         WHERE T1.col1 = T2.col1 );

col1	col2
2	b
3	(NULL)
4	(NULL)
3	(NULL)
4	(NULL)

The NULL-containing rows from T1 are not excluded by the comparison. In contrast, approaching the problem using EXCEPT ALL excludes NULL-containing rows that appear in both tables. In this case, the (3, NULL) row in T2 is identified as the same as the (3, NULL) row in T1.

SELECT col1, col2
   FROM T1
   EXCEPT ALL
      SELECT col1, col2
         FROM T2;

col1	col2
2	b
3	(NULL)
4	(NULL)
4	(NULL)

The EXCEPT operator is more restrictive still. It eliminates both (3, NULL) rows from T1 and excludes one of the (4, NULL) rows as a duplicate.

SELECT col1, col2
   FROM T1
   EXCEPT
      SELECT col1, col2
         FROM T2;

col1	col2
2	b
4	(NULL)

Discuss this page in DocCommentXchange.