SQL: beware of NULL in WHERE NOT IN
If you have ever written some SQL you are probably aware of the special status of NULL in SQL.
To refresh your mind: NULL indicates absence of a value. You cannot use NULL in a comparison, it will always result in NULL. If this does not lead to the desired result there are special operators to deal with NULL explicitly. It behaves similar to NaN in floating point operations.
CREATE TABLE pony ( id INT PRIMARY KEY, name VARCHAR(255) ); INSERT INTO pony (id, name) VALUES (1, 'Twilight Sparkle'), (2, 'Rainbow Dash'), (3, 'Pinkie Pie'), (4, 'Rarity'), (5, 'Applejack'); SELECT * FROM pony; -- 5 rows as expected SELECT * FROM pony WHERE id = NULL; -- 0 rows as expected SELECT * FROM pony WHERE id != NULL; -- 0 rows, slight wtf SELECT * FROM pony WHERE id IS NOT NULL; -- 5 rows as expected
NULL still works intuitively when using WHERE IN:
SELECT * FROM pony WHERE id IN (1, 2, NULL); -- 2 rows as expected -- equivalent statement: SELECT * FROM pony WHERE id = 1 OR id = 2 OR id = null;
WHERE NOT IN is where things get tricky:
SELECT * FROM pony WHERE id NOT IN (1, 2, NULL); -- 0 rows, major wtf
It makes sense if you split the clause into individual comparisons:
SELECT * FROM pony WHERE NOT ( id = 1 OR id = 2 OR id = NULL );
And then remove the parenthesis using De Morgan’s laws:
SELECT * FROM pony WHERE id != 1 AND id != 2 AND id != NULL;
Like explained in the intro, id != NULL is always NULL, therefor the entire WHERE clause is always FALSE.
Practical advice and the subquery problem
If you are using external data, for example from XML or a CSV, you should filter out empty values.
More insidiously NULL can sneak in via a subquery. Lets say we want to know how many ponies are currently out of town:
WITH ponies_in_town AS ( SELECT pony.id, pony.name FROM town LEFT JOIN town_pony ON town WHERE town.name = 'Ponyville' ) SELECT COUNT(*) AS num_ponies_out_of_town FROM pony WHERE id NOT IN ( SELECT id FROM ponies_in_town )
If Applejack is alone in town, it says 4 ponies have left. If Applejack also leaves, it says 0 ponies have left. A fix for this is left as an exercise for the reader.
Subqueries are good
Let me emphasize this. The alternative to a subquery is to place table joins in the main query. This can lead to a more subtle insidious problem, which is duplicate rows. SQL using subqueries is more “modular” and therefore easier to understand and reuse. It can also be faster.
The example above as it stands does not demonstrate this but if we add a few more requirements we can demonstrate these benefits. So keep using WHERE NOT IN, just mind NULL.
But what if i actually want to take into account NULLS?
Make a separate where clause for null: WHERE val NOT IN (1, 2) OR val IS NULL
Author 02 mrt 2018|
Looking for a job?
Interested in one of our openings, we are always in for a cup of coffee! We are currently looking for a senior front-end developer and a senior PHP back-end developer.
Heb je nog geen beveiligd HTTPS webadres, dan kan dit consequenties hebben voor de veiligheid van je bezoekers en positie in Google.
Today I discovered some neat features and plugins in PHPStorm worthy of sharing; these features made my life so much easier. I hope it will make your lives at least a little bit easier too 😉
At Polder Knowledge, we often apply an “API First Strategy”. At Polder Knowledge we interpret this as a loose project that contains the business logic for an application.
Being a developer comes with the endless pursuit of making work easier for yourself. This time: Ngrok for testing webhooks.
Need a DNS solution that could automatically be configured when an new container is started (e.g dnsmasq)? We found this solution.
Looking for a job?
We are always looking for professionals to strengthen our team. Interested in one of our openings, we are always in for a cup of coffee! We are currently looking for a senior front-end developer and a senior PHP back-end developer.