Recently I came across a bug in code, equivalent to this snippet:
create table testing(num int, dt date, istrue boolean);
insert into testing values (1, '2018-01-01', True);
UPDATE testing SET num = 2, istrue = null AND dt = '2018-01-01';
The UPDATE statement should have actually been:
UPDATE testing SET num = 2, istrue = null WHERE dt = '2018-01-01';
Yet postgresql happily accepted the UPDATE as valid and it evaluated the code:
null AND dt = '2018-01-01'
after the second equal sign as a boolean and therefore updated data based on bad logic.
I have tried this in other SQL flavors and it is not allowed, they want a WHERE.
I'm not a postgres person but it does seem like a very odd syntax quirk, or is it a bug?
3 Answers 3
Yet postgresql happily accepted the
UPDATE
as valid and it evaluated the code:null AND dt = '2018-01-01'
after the second equal sign as a boolean and therefore updated data based on bad logic.
Bad things happen where there is bad code. In this case, the code was not equivalent to the logic you wanted, and had replaced the WHERE
with AND
. This kind - of bad substitution - often results in errors. The rare times that it doesn't result in error you get this - bad - results where you get different/unexpected results from a query or - even catastrophic - you update the whole table where you wanted to update a few rows.
I have tried this in other SQL flavors and it is not allowed, they want a
WHERE
.
I don't know of any DBMS that requires WHERE
in an UPDATE
statement. SQL Server, Oracle, DB2, SQLite, MySQL, neither of them requires it. That is the job of the database developers, code reviewers and testing - to assure that code aligns with business logic and requirements.
I'm not a Postgres person but it does seem like a very odd syntax quirk, or is it a bug?
It's definitely not a bug. The code is parsed and executed - as you correctly assumed - as:
UPDATE
testing
SET
num = 2,
istrue = ( null AND (dt = '2018-01-01') )
;
resulting in updating the whole table. istrue
will be updated with a value of NULL
for the rows where dt
is NULL
or equal to '2018-01-01'
and with a value of FALSE
for the rest of the rows.
I'm not a postgres person but it does seem like a very odd syntax quirk, or is it a bug?
It's not a RDBMS issue. It's just the spec, from SQL:2011
<update statement: positioned> ::=
UPDATE <target table> [ [ AS ] <correlation name> ]
SET <set clause list>
WHERE CURRENT OF <cursor name>
So you're before the WHERE
so the next relevant thing is set clause list
, which is
<set clause list> ::=
<set clause> [ { <comma> <set clause> }... ]
So you split on commas and the you get the set clause
<set clause> ::=
<multiple column assignment>
| <set target> <equals operator> <update source>
Than you split on equals and you're looking at the update source
, which is
<update source> ::=
<value expression>
| <contextually typed value specification>
So it has to return a value expression
and the presence of an AND
makes it a boolean, value expression
,
<boolean term> ::=
<boolean factor>
| <boolean term> AND <boolean factor>
Any database that doesn't do this is breaking spec.
The issue is that other DBMSS don't support boolean values, so you'd need to write something like this:
UPDATE testing SET
num = 2,
istrue = CASE
WHEN null AND dt = '2018-01-01' THEN 1
WHEN NOT (null AND dt = '2018-01-01') THEN 0
ELSE NULL
END;
The behaviour of Postgres is exactly as one should expect given the existence of a =
operator that produces boolean values – nothing strange going on at all.