Enforce NOT NULL for set of columns with a CHECK constraint only for new rows

Question 1

I have a table and need to add a new column without a default value:

Constraint:

ALTER TABLE integrations.billables 
DROP CONSTRAINT IF EXISTS cc_at_least_one_mapping_needed_billables,
ADD CONSTRAINT cc_at_least_one_mapping_needed_billables 
CHECK ((("qb_id" IS NOT NULL) :: INTEGER +
 ("xero_id" IS NOT NULL) :: INTEGER +
 ("freshbooks_id" IS NOT NULL) :: INTEGER +
 ("unleashed_id" IS NOT NULL) :: INTEGER +
 ("csv_data" IS NOT NULL) :: INTEGER +
 ("myob_id" IS NOT NULL) :: INTEGER) > 0);

Column:

ALTER TABLE integrations.billables
DROP COLUMN IF EXISTS myob_id,
ADD COLUMN myob_id varchar(255);

Question:

How can I add the constraint for the next values and not for those which are already in there? (Otherwise I'd get the error check constraint "" is violated by some row).

This is related to my previous question: ERROR: check constraint is violated by some row

Question 2

@ypercubeTM it's clear for me. As you can see above, I'm adding the column to an existing table (which already has the constraint)

Question 3

@ypercubeTM: Obviously, commands in the question have to be reordered. The constraint shall be adapted after the new column has been added.

Question 4

Mark all your existing rows as old:

ALTER TABLE integrations.billables
ADD COLUMN is_old BOOLEAN NOT NULL DEFAULT false;
UPDATE integrations.billables SET is_old = true;

And set up the constraint to ignore old rows:

ALTER TABLE integrations.billables
ADD CONSTRAINT cc_at_least_one_mapping_needed_billables 
CHECK (
 NOT(("qb_id", "xero_id", "freshbooks_id", "unleashed_id", "csv_data", "myob_id") IS NULL)
 OR is_old
);

(Yes, that IS NULL check works. See here.)

Advantages of this mechanism:

Constraint remains valid
You can continue to update old rows without filling in this value

Downsides:

A similar situation down the road will be messy. You would have to add a second boolean column or some other hoop jumping for the second new column.
If you want to force updated rows to be given a value, this won't do it.
This has the potential for abuse, since someone could just flip the is_old flag to true. (This can be addressed, though. See below.) This isn't something to be concerned about if end users can't access the database directly and you can trust the developers not to do wacky things with the data.

If you are worried about somebody changing the flag, you could set up a trigger to prevent any inserts or updates from setting is_old to true:

CREATE FUNCTION throw_error_on_illegal_old()
 RETURNS trigger
 LANGUAGE plpgsql
 AS $$
 BEGIN
 IF NEW.is_old THEN
 -- Need to make sure we don't try to access
 -- OLD in an INSERT
 IF TG_OP = 'INSERT' THEN
 RAISE 'Cannot create new with is_old = true';
 ELSE
 IF NOT OLD.is_old THEN
 RAISE 'Cannot change is_old from false to true';
 END IF;
 END IF;
 END IF;
 -- If we get here, all tests passed
 RETURN NEW;
 END
 $$
;
CREATE TRIGGER billables_prohibit_marking_row_old
BEFORE INSERT OR UPDATE ON integrations.billables
FOR EACH ROW EXECUTE PROCEDURE throw_error_on_illegal_old()
;

You do still have to trust that no one who can modify the database schema is going to come along and drop your trigger or something, but if they're going to do that, they could drop the constraint, too.

Here's a SQLFiddle demo. Note that the "should skip" row is not in the output (as we desired).

Question 5

Your answer is very interesting. However, I'd have to update more than 40m rows. It's quite impossible =\

Question 6

I seriously doubt that it's impossible. It might take a long time, but that's a very different problem. Whether you can deal with that time frame depends on your constraints. You can test how long it takes, and once you know, you can make a decision. Can you afford to be offline that long? Could you put the database/application in a read only state while it runs? (For the latter, you could possibly make the change in a clone and then switch over to the new database.)

Question 7

@LucasPossamai It is probably much faster than you think. I just tested on a table with over 14 million rows. Adding the column took 273023 ms (4.5 minutes). The UPDATE took 349422 ms (5.8 minutes). Applying a similar CHECK took 20996 ms (21 seconds). That's a total of about 10 minutes. This was all on a pretty low powered machine (4 GB RAM, dual core CPU). You have a little less than 3 times the number of rows. If the runtime grows linearly, which I expected it would, you're talking about a half hour of downtime, extremely reasonable in many contexts. But do test the runtimes yourself.

Question 8

If you have a serial column or an integer one that's automatically populated with a nextval (so that you are never supposed to insert new rows with an explicit value for that column), you could additionally check whether the value of that column is greater than a specific value:

(
 (("qb_id" IS NOT NULL) :: INTEGER +
 ("xero_id" IS NOT NULL) :: INTEGER +
 ("freshbooks_id" IS NOT NULL) :: INTEGER +
 ("unleashed_id" IS NOT NULL) :: INTEGER +
 ("csv_data" IS NOT NULL) :: INTEGER +
 ("myob_id" IS NOT NULL) :: INTEGER)> 0
 OR
 YourSerialColumn <= value
)

where the value should be determined as currval of the column/corresponding sequence at the time of altering/recreating the constraint.

This way the IS NOT NULL checks will apply only to the rows whose YourSerialColumn value is greater than value.

Note

This can be viewed as a variation on David Spillet's suggestion. The principal difference lies in the fact that this solution does not require structural changes (partitioning). Both options, however, rely on existence of a suitable column in your table. If there is no such a column and you can add it specifically to solve this problem, going with the partitioning idea might be the better option of these two.

Question 9

Just add the constraint as NOT VALID

From the manual:

If the constraint is marked NOT VALID, the potentially-lengthy initial check to verify that all rows in the table satisfy the constraint is skipped. The constraint will still be enforced against subsequent inserts or updates (that is, [...] and they'll fail unless the new row matches the specified check constraints)

Question 10

One potential pitfall that I see here is if you intend for that column to remain null for existing rows going forward, then you can't ever update any of their other columns either.

Question 11

@a_horse_with_no_name will I be able to VALIDATE the constraint in the future? To do that I'll have to insert values on the existenced null values, am I right?

Question 12

This was not possible in Postgres up to version 9.1. From 9.2 onwards you can define a check constraint as NOT VALID (equivalent to WITH NOCHECK in MS SQL Server). See http://www.postgresql.org/docs/9.2/static/sql-altertable.html for more detail.

I'm not generally happy with this sort of thing where it is at all possible to avoid. A compromise if you have a suitable partitioning key (a date of entry field for instance) is that you could perhaps partition the table and only apply the NOT NULL constraint to the partition containing newer rows. In either case make sure the design choice is well documented so that future DBAs/developers/other know that NULL values might be expected in a certain subset of the records in the table.

Question 13

Your CHECK constraint can be much simpler:

ALTER TABLE billables
ADD CONSTRAINT cc_at_least_one_mapping_needed_billables 
CHECK (qb_id IS NOT NULL OR
 xero_id IS NOT NULL OR 
 freshbooks_id IS NOT NULL OR
 unleashed_id IS NOT NULL OR
 csv_data IS NOT NULL OR
 myob_id IS NOT NULL) NOT VALID;

Or even just:

CONSTRAINT cc_at_least_one_mapping_needed_billables 
CHECK (NOT (qb_id,xero_id,freshbooks_id,unleashed_id,csv_data,myob_id) IS NULL) NOT VALID;

Why does that work?

I already added the NOT VALID clause that @a_horse mentioned. This way the constraint only applies to newly added rows. You also have to consider possible dump/restore cycles. Details:

And you can do it all in a single command, which is fastest and prevents possible concurrent transactions from doing anything wrong:

ALTER TABLE integrations.billables
 DROP CONSTRAINT IF EXISTS cc_at_least_one_mapping_needed_billables
, ADD COLUMN myob_id varchar(255)
, ADD CONSTRAINT cc_at_least_one_mapping_needed_billables 
 CHECK (NOT (qb_id,xero_id, freshbooks_id,unleashed_id, csv_data, myob_id) IS NULL)
 NOT VALID;

db<>fiddle here
_{Old sqlfiddle}

Aside 1: If you already had the CHECK constraint on the same set of columns, just without the new myob_id, then there wouldn't be a problem, since every existing row would pass the new CHECK constraint with myob_id as well.

Aside 2: In some RDBMS it makes sense to use varchar(255) to optimize performance. This is irrelevant to Postgres and 255 as length modifier only makes sense if you actually need to restrict the length to a maximum of 255:

jpmc26 jpmc26 1,6523 gold badges20 silver badges38 bronze badges · Accepted Answer · 2016-02-04 03:25:08Z

Mark all your existing rows as old:

ALTER TABLE integrations.billables
ADD COLUMN is_old BOOLEAN NOT NULL DEFAULT false;
UPDATE integrations.billables SET is_old = true;

And set up the constraint to ignore old rows:

ALTER TABLE integrations.billables
ADD CONSTRAINT cc_at_least_one_mapping_needed_billables 
CHECK (
 NOT(("qb_id", "xero_id", "freshbooks_id", "unleashed_id", "csv_data", "myob_id") IS NULL)
 OR is_old
);

(Yes, that IS NULL check works. See here.)

Advantages of this mechanism:

Constraint remains valid
You can continue to update old rows without filling in this value

Downsides:

A similar situation down the road will be messy. You would have to add a second boolean column or some other hoop jumping for the second new column.
If you want to force updated rows to be given a value, this won't do it.
This has the potential for abuse, since someone could just flip the is_old flag to true. (This can be addressed, though. See below.) This isn't something to be concerned about if end users can't access the database directly and you can trust the developers not to do wacky things with the data.

If you are worried about somebody changing the flag, you could set up a trigger to prevent any inserts or updates from setting is_old to true:

CREATE FUNCTION throw_error_on_illegal_old()
 RETURNS trigger
 LANGUAGE plpgsql
 AS $$
 BEGIN
 IF NEW.is_old THEN
 -- Need to make sure we don't try to access
 -- OLD in an INSERT
 IF TG_OP = 'INSERT' THEN
 RAISE 'Cannot create new with is_old = true';
 ELSE
 IF NOT OLD.is_old THEN
 RAISE 'Cannot change is_old from false to true';
 END IF;
 END IF;
 END IF;
 -- If we get here, all tests passed
 RETURN NEW;
 END
 $$
;
CREATE TRIGGER billables_prohibit_marking_row_old
BEFORE INSERT OR UPDATE ON integrations.billables
FOR EACH ROW EXECUTE PROCEDURE throw_error_on_illegal_old()
;

You do still have to trust that no one who can modify the database schema is going to come along and drop your trigger or something, but if they're going to do that, they could drop the constraint, too.

Here's a SQLFiddle demo. Note that the "should skip" row is not in the output (as we desired).

Your answer is very interesting. However, I'd have to update more than 40m rows. It's quite impossible =\
I seriously doubt that it's impossible. It might take a long time, but that's a very different problem. Whether you can deal with that time frame depends on your constraints. You can test how long it takes, and once you know, you can make a decision. Can you afford to be offline that long? Could you put the database/application in a read only state while it runs? (For the latter, you could possibly make the change in a clone and then switch over to the new database.)
@LucasPossamai It is probably much faster than you think. I just tested on a table with over 14 million rows. Adding the column took 273023 ms (4.5 minutes). The UPDATE took 349422 ms (5.8 minutes). Applying a similar CHECK took 20996 ms (21 seconds). That's a total of about 10 minutes. This was all on a pretty low powered machine (4 GB RAM, dual core CPU). You have a little less than 3 times the number of rows. If the runtime grows linearly, which I expected it would, you're talking about a half hour of downtime, extremely reasonable in many contexts. But do test the runtimes yourself.

Stack Exchange Network

Enforce NOT NULL for set of columns with a CHECK constraint only for new rows

Constraint:

Column:

Question:

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Enforce NOT NULL for set of columns with a CHECK constraint only for new rows

Constraint:

Column:

Question:

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions