I have a table and need to add a new column without a default value:
Constraint:
ALTER TABLE integrations.billables
DROP CONSTRAINT IF EXISTS cc_at_least_one_mapping_needed_billables,
ADD CONSTRAINT cc_at_least_one_mapping_needed_billables
CHECK ((("qb_id" IS NOT NULL) :: INTEGER +
("xero_id" IS NOT NULL) :: INTEGER +
("freshbooks_id" IS NOT NULL) :: INTEGER +
("unleashed_id" IS NOT NULL) :: INTEGER +
("csv_data" IS NOT NULL) :: INTEGER +
("myob_id" IS NOT NULL) :: INTEGER) > 0);
Column:
ALTER TABLE integrations.billables
DROP COLUMN IF EXISTS myob_id,
ADD COLUMN myob_id varchar(255);
Question:
How can I add the constraint for the next values and not for those which are already in there? (Otherwise I'd get the error check constraint "" is violated by some row).
This is related to my previous question: ERROR: check constraint is violated by some row
-
@ypercubeTM it's clear for me. As you can see above, I'm adding the column to an existing table (which already has the constraint)user83914– user839142016年02月03日 10:48:42 +00:00Commented Feb 3, 2016 at 10:48
-
@ypercubeTM: Obviously, commands in the question have to be reordered. The constraint shall be adapted after the new column has been added.Erwin Brandstetter– Erwin Brandstetter2016年02月03日 14:49:08 +00:00Commented Feb 3, 2016 at 14:49
5 Answers 5
Mark all your existing rows as old:
ALTER TABLE integrations.billables
ADD COLUMN is_old BOOLEAN NOT NULL DEFAULT false;
UPDATE integrations.billables SET is_old = true;
And set up the constraint to ignore old rows:
ALTER TABLE integrations.billables
ADD CONSTRAINT cc_at_least_one_mapping_needed_billables
CHECK (
NOT(("qb_id", "xero_id", "freshbooks_id", "unleashed_id", "csv_data", "myob_id") IS NULL)
OR is_old
);
(Yes, that IS NULL
check works. See here.)
Advantages of this mechanism:
- Constraint remains valid
- You can continue to update old rows without filling in this value
Downsides:
- A similar situation down the road will be messy. You would have to add a second
boolean
column or some other hoop jumping for the second new column. - If you want to force updated rows to be given a value, this won't do it.
- This has the potential for abuse, since someone could just flip the
is_old
flag totrue
. (This can be addressed, though. See below.) This isn't something to be concerned about if end users can't access the database directly and you can trust the developers not to do wacky things with the data.
If you are worried about somebody changing the flag, you could set up a trigger to prevent any inserts or updates from setting is_old
to true
:
CREATE FUNCTION throw_error_on_illegal_old()
RETURNS trigger
LANGUAGE plpgsql
AS $$
BEGIN
IF NEW.is_old THEN
-- Need to make sure we don't try to access
-- OLD in an INSERT
IF TG_OP = 'INSERT' THEN
RAISE 'Cannot create new with is_old = true';
ELSE
IF NOT OLD.is_old THEN
RAISE 'Cannot change is_old from false to true';
END IF;
END IF;
END IF;
-- If we get here, all tests passed
RETURN NEW;
END
$$
;
CREATE TRIGGER billables_prohibit_marking_row_old
BEFORE INSERT OR UPDATE ON integrations.billables
FOR EACH ROW EXECUTE PROCEDURE throw_error_on_illegal_old()
;
You do still have to trust that no one who can modify the database schema is going to come along and drop your trigger or something, but if they're going to do that, they could drop the constraint, too.
Here's a SQLFiddle demo. Note that the "should skip" row is not in the output (as we desired).
-
Your answer is very interesting. However, I'd have to update more than 40m rows. It's quite impossible =\user83914– user839142016年02月04日 04:34:16 +00:00Commented Feb 4, 2016 at 4:34
-
1I seriously doubt that it's impossible. It might take a long time, but that's a very different problem. Whether you can deal with that time frame depends on your constraints. You can test how long it takes, and once you know, you can make a decision. Can you afford to be offline that long? Could you put the database/application in a read only state while it runs? (For the latter, you could possibly make the change in a clone and then switch over to the new database.)jpmc26– jpmc262016年02月04日 04:45:53 +00:00Commented Feb 4, 2016 at 4:45
-
@LucasPossamai It is probably much faster than you think. I just tested on a table with over 14 million rows. Adding the column took 273023 ms (4.5 minutes). The
UPDATE
took 349422 ms (5.8 minutes). Applying a similarCHECK
took 20996 ms (21 seconds). That's a total of about 10 minutes. This was all on a pretty low powered machine (4 GB RAM, dual core CPU). You have a little less than 3 times the number of rows. If the runtime grows linearly, which I expected it would, you're talking about a half hour of downtime, extremely reasonable in many contexts. But do test the runtimes yourself.jpmc26– jpmc262016年02月04日 05:02:24 +00:00Commented Feb 4, 2016 at 5:02
If you have a serial
column or an integer one that's automatically populated with a nextval
(so that you are never supposed to insert new rows with an explicit value for that column), you could additionally check whether the value of that column is greater than a specific value:
(
(("qb_id" IS NOT NULL) :: INTEGER +
("xero_id" IS NOT NULL) :: INTEGER +
("freshbooks_id" IS NOT NULL) :: INTEGER +
("unleashed_id" IS NOT NULL) :: INTEGER +
("csv_data" IS NOT NULL) :: INTEGER +
("myob_id" IS NOT NULL) :: INTEGER)> 0
OR
YourSerialColumn <= value
)
where the value
should be determined as currval
of the column/corresponding sequence at the time of altering/recreating the constraint.
This way the IS NOT NULL
checks will apply only to the rows whose YourSerialColumn
value is greater than value
.
Note
This can be viewed as a variation on David Spillet's suggestion. The principal difference lies in the fact that this solution does not require structural changes (partitioning). Both options, however, rely on existence of a suitable column in your table. If there is no such a column and you can add it specifically to solve this problem, going with the partitioning idea might be the better option of these two.
Just add the constraint as NOT VALID
If the constraint is marked
NOT VALID
, the potentially-lengthy initial check to verify that all rows in the table satisfy the constraint is skipped. The constraint will still be enforced against subsequent inserts or updates (that is, [...] and they'll fail unless the new row matches the specified check constraints)
-
5One potential pitfall that I see here is if you intend for that column to remain null for existing rows going forward, then you can't ever update any of their other columns either.David Jacobsen– David Jacobsen2016年02月03日 19:56:49 +00:00Commented Feb 3, 2016 at 19:56
-
1@a_horse_with_no_name will I be able to VALIDATE the constraint in the future? To do that I'll have to insert values on the existenced null values, am I right?user83914– user839142016年02月03日 21:56:08 +00:00Commented Feb 3, 2016 at 21:56
This was not possible in Postgres up to version 9.1. From 9.2 onwards you can define a check constraint as NOT VALID
(equivalent to WITH NOCHECK
in MS SQL Server). See http://www.postgresql.org/docs/9.2/static/sql-altertable.html for more detail.
I'm not generally happy with this sort of thing where it is at all possible to avoid. A compromise if you have a suitable partitioning key (a date of entry field for instance) is that you could perhaps partition the table and only apply the NOT NULL
constraint to the partition containing newer rows. In either case make sure the design choice is well documented so that future DBAs/developers/other know that NULL values might be expected in a certain subset of the records in the table.
Your CHECK
constraint can be much simpler:
ALTER TABLE billables
ADD CONSTRAINT cc_at_least_one_mapping_needed_billables
CHECK (qb_id IS NOT NULL OR
xero_id IS NOT NULL OR
freshbooks_id IS NOT NULL OR
unleashed_id IS NOT NULL OR
csv_data IS NOT NULL OR
myob_id IS NOT NULL) NOT VALID;
Or even just:
CONSTRAINT cc_at_least_one_mapping_needed_billables
CHECK (NOT (qb_id,xero_id,freshbooks_id,unleashed_id,csv_data,myob_id) IS NULL) NOT VALID;
Why does that work?
I already added the NOT VALID
clause that @a_horse mentioned. This way the constraint only applies to newly added rows. You also have to consider possible dump/restore cycles. Details:
- Disable all constraints and table checks while restoring a dump
- Best way to populate a new column in a large table?
And you can do it all in a single command, which is fastest and prevents possible concurrent transactions from doing anything wrong:
ALTER TABLE integrations.billables
DROP CONSTRAINT IF EXISTS cc_at_least_one_mapping_needed_billables
, ADD COLUMN myob_id varchar(255)
, ADD CONSTRAINT cc_at_least_one_mapping_needed_billables
CHECK (NOT (qb_id,xero_id, freshbooks_id,unleashed_id, csv_data, myob_id) IS NULL)
NOT VALID;
Aside 1: If you already had the CHECK
constraint on the same set of columns, just without the new myob_id
, then there wouldn't be a problem, since every existing row would pass the new CHECK
constraint with myob_id
as well.
Aside 2: In some RDBMS it makes sense to use varchar(255)
to optimize performance. This is irrelevant to Postgres and 255 as length modifier only makes sense if you actually need to restrict the length to a maximum of 255: