SQL: beware of NULL in WHERE NOT IN

If you have ever written some SQL you are probably aware of the special status of NULL in SQL.

To refresh your mind: NULL indicates absence of a value. You cannot use NULL in a comparison, it will always result in NULL. If this does not lead to the desired result there are special operators to deal with NULL explicitly. It behaves similar to NaN in floating point operations.

Example code:

CREATE TABLE pony
(
     id INT PRIMARY KEY,
     name VARCHAR(255)
);
 
INSERT INTO pony (id, name)
VALUES
     (1, 'Twilight Sparkle'),
     (2, 'Rainbow Dash'),
     (3, 'Pinkie Pie'),
     (4, 'Rarity'),
     (5, 'Applejack');

SELECT * FROM pony;
-- 5 rows as expected

SELECT * FROM pony 
WHERE id = NULL;
-- 0 rows as expected

SELECT * FROM pony 
WHERE id != NULL;
-- 0 rows, slight wtf

SELECT * FROM pony 
WHERE id IS NOT NULL;
-- 5 rows as expected

NULL still works intuitively when using WHERE IN:

SELECT * FROM pony
WHERE id IN (1, 2, NULL);
-- 2 rows as expected
 
-- equivalent statement:
SELECT * FROM pony
WHERE id = 1
    OR id = 2
    OR id = null;

WHERE NOT IN is where things get tricky:

SELECT * FROM pony 
WHERE id NOT IN (1, 2, NULL);
-- 0 rows, major wtf

It makes sense if you split the clause into individual comparisons:

SELECT * FROM pony
WHERE NOT (
    id = 1
    OR id = 2
    OR id = NULL
);

And then remove the parenthesis using De Morgan’s laws:

SELECT * FROM pony
WHERE id != 1
    AND id != 2
    AND id != NULL;

Like explained in the intro, id != NULL is always NULL, therefor the entire WHERE clause is always FALSE.

Practical advice and the subquery problem

If you are using external data, for example from XML or a CSV, you should filter out empty values.

More insidiously NULL can sneak in via a subquery. Lets say we want to know how many ponies are currently out of town:

WITH ponies_in_town AS (
    SELECT pony.id, pony.name
    FROM town
    LEFT JOIN town_pony ON town
    WHERE town.name = 'Ponyville'
)
SELECT
    COUNT(*) AS num_ponies_out_of_town
FROM pony
WHERE id NOT IN (
    SELECT id FROM ponies_in_town
)

If Applejack is alone in town, it says 4 ponies have left. If Applejack also leaves, it says 0 ponies have left. A fix for this is left as an exercise for the reader.

Subqueries are good

Let me emphasize this. The alternative to a subquery is to place table joins in the main query. This can lead to a more subtle insidious problem, which is duplicate rows. SQL using subqueries is more “modular” and therefore easier to understand and reuse. It can also be faster.

The example above as it stands does not demonstrate this but if we add a few more requirements we can demonstrate these benefits. So keep using WHERE NOT IN, just mind NULL.

But what if i actually want to take into account NULLS?

Make a separate where clause for null: WHERE val NOT IN (1, 2) OR val IS NULL

 

Author  | 02 mrt 2018

 

Looking for a job?

Interested in one of our openings, we are always in for a cup of coffee! We are currently looking for a senior front-end developer and a senior PHP back-end developer.

Blockchain | Hype of revolutie?

Blockchain | Hype of revolutie?

Een kritische blik op de toepasbaarheid van blockchain De manier waarop we met gegevens omgaan heeft de laatste jaren een fenomenale verandering doorgemaakt. Bedrijven en overheden opereren steeds meer data gedreven en het einde is nog lang niet in zicht. Waar big...

De AI paradox van onzichtbare branding

De AI paradox van onzichtbare branding

De meeste mensen hebben eigenlijk geen idee dat ze continu in contact staat met een algoritme (en daardoor worden beïnvloed). En dat is in mijn visie een branding-probleem.De AI paradox van onzichtbare branding Kunstmatige intelligentie (AI, artificial intelligence)...

Google Analytics en AVG/GDPR

Google Analytics en AVG/GDPR

Google Analytics bewaartermijn van gegevens Als je gebruik maakt van Google Analytics dan heb je waarschijnlijk het bericht ontvangen van Google dat je moet kiezen welke bewaartermijn je wilt hanteren voor persoonsgegevens. Na deze termijn zullen gegevens...

Using my IDE was not USING my IDE

Using my IDE was not USING my IDE

Today I discovered some neat features and plugins in PHPStorm worthy of sharing; these features made my life so much easier. I hope it will make your lives at least a little bit easier too 😉

Looking for a job?

We are always looking for professionals to strengthen our team. Interested in one of our openings, we are always in for a cup of coffee! We are currently looking for a senior front-end developer and a senior PHP back-end developer.

Share This