PostgreSQL 11
What is the best way to generate default values for identity columns on partition tables.
E.g
CREATE TABLE data.log
(
id BIGINT GENERATED ALWAYS AS IDENTITY
(
INCREMENT BY 1
MINVALUE -9223372036854775808
MAXVALUE 9223372036854775807
START WITH -9223372036854775808
RESTART WITH -9223372036854775808
CYCLE
),
epoch_millis BIGINT NOT NULL,
message TEXT NOT NULL
) PARTITION BY RANGE (epoch_millis);
CREATE TABLE data.foo_log
PARTITION OF data.log
(
PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);
If I do:
INSERT INTO data.foo_log (epoch_millis, message)
VALUES (1000000, 'hello');
I get:
ERROR: null value in column "id" violates not-null constraint
DETAIL: Failing row contains (null, 1000000, hello).
SQL state: 23502
because the default generated value is not applied to the partition UNLESS I insert it into the root table like this:
INSERT INTO data.log (epoch_millis, message)
VALUES (1000000, 'hello');
There are times though that I want to insert directly into a specific partition for performance reasons (like doing bulk COPY).
The only way I can get this to work is to create the partition while knowing about the sequence that was implicitly created for the identity column like this:
CREATE TABLE data.foo_log
PARTITION OF data.log
(
id DEFAULT nextval('data.log_id_seq'),
PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);
Is there a better way to do this and if so how?
1 Answer 1
I don't know of a better solution in general. A few minor things, though:
pg_get_serial_sequence()
If you don't know the name of the parent's implicit sequence, use pg_get_serial_sequence()
.
SELECT pg_get_serial_sequence('data.log', 'id');
You might even use the expression in the CREATE TABLE
script directly, but that would impose a very minor additional cost to compute the actual name for the default (once per transaction, I think), and since this is about performance optimization ...
COPY
overrides GENERATED ALWAYS
, but triggers do not
Defining your id
column as GENERATED ALWAYS AS IDENTITY
has the effect that you are not allowed to provide user values for the column id
in INSERT
statements, unless adding an "override" clause like:
INSERT INTO data.log (epoch_millis, message) OVERRIDING USER VALUE VALUES (1000000, 'hello');
OVERRIDING USER VALUE
If this clause is specified, then any values supplied for identity columns are ignored and the default sequence-generated values are applied.
This clause is useful for example when copying values between tables. Writing
INSERT INTO tbl2 OVERRIDING USER VALUE SELECT * FROM tbl1
will copy fromtbl1
all columns that are not identity columns intbl2
while values for the identity columns intbl2
will be generated by the sequences associated withtbl2
.
COPY
overrides in any case. The manual:
For identity columns, the
COPY FROM
command will always write the column values provided in the input data, like theINSERT
optionOVERRIDING SYSTEM VALUE
.
But while writing to a partition directly with your solution, INSERT
also overrides, so it will be your responsibility to avoid providing user values for the id
column directly. An alternative would be to use a trigger instead of the default value in the partition:
CREATE OR REPLACE FUNCTION trg_log_default_id()
RETURNS trigger
LANGUAGE plpgsql AS
$func$
BEGIN
NEW.id := nextval('data.log_id_seq')
RETURN NEW;
END
$func$;
CREATE TRIGGER insbef_default_id
BEFORE INSERT ON data.foo_log -- the partition
FOR EACH ROW
EXECUTE PROCEDURE trg_log_default_id();
This assigns a number from the sequence in any case, more closely emulating the GENERATED ALWAYS
behavior of the parent - stricter, even, also preventing COPY
from violating your rule. The manual:
COPY FROM
will invoke any triggers and check constraints on the destination table.
A trigger is a bit more expensive than a plain default value. And it burns an extra serial number per row for regular inserts via the parent table. (It should be possible to distinguish cases in the trigger, didn't try now.)
-
As always your answers I see are thorough. I like the idea of using
pg_get_serial_sequence
perhaps combined withinformation_schema.columns
to format SQL creating partitions withDEFAULT nextval
column constraints. The trigger is intriguing but I'm avoiding them on this particular table to get the best write speed. Given this data is in a trusted zone, I think we can live with a little convention for not providing ID values in ourINSERT/COPY
statements.akagixxer– akagixxer2018年12月19日 19:48:51 +00:00Commented Dec 19, 2018 at 19:48 -
@akagixxer: Yes, probably the way to go.Erwin Brandstetter– Erwin Brandstetter2018年12月19日 22:47:48 +00:00Commented Dec 19, 2018 at 22:47
Explore related questions
See similar questions with these tags.
COPY
commands include theid
column and if so, should those override the default? Since the parent hasGENERATED ALWAYS
I would assume you want user values to be discarded or an exception raised? (But that's not what your current solution does.)