PostgreSQL How to DEFAULT Partitioned Identity Column?

Question 1

PostgreSQL 11
What is the best way to generate default values for identity columns on partition tables.
E.g

CREATE TABLE data.log
(
 id BIGINT GENERATED ALWAYS AS IDENTITY
 (
 INCREMENT BY 1
 MINVALUE -9223372036854775808
 MAXVALUE 9223372036854775807
 START WITH -9223372036854775808
 RESTART WITH -9223372036854775808
 CYCLE
 ),
 epoch_millis BIGINT NOT NULL,
 message TEXT NOT NULL
) PARTITION BY RANGE (epoch_millis);
CREATE TABLE data.foo_log
PARTITION OF data.log
(
 PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);

If I do:

INSERT INTO data.foo_log (epoch_millis, message)
VALUES (1000000, 'hello');

I get:

ERROR: null value in column "id" violates not-null constraint
DETAIL: Failing row contains (null, 1000000, hello).
SQL state: 23502

because the default generated value is not applied to the partition UNLESS I insert it into the root table like this:

INSERT INTO data.log (epoch_millis, message)
VALUES (1000000, 'hello');

There are times though that I want to insert directly into a specific partition for performance reasons (like doing bulk COPY).
The only way I can get this to work is to create the partition while knowing about the sequence that was implicitly created for the identity column like this:

CREATE TABLE data.foo_log
PARTITION OF data.log
(
 id DEFAULT nextval('data.log_id_seq'),
 PRIMARY KEY (id)
)
FOR VALUES FROM (0) TO (9999999999);

Is there a better way to do this and if so how?

Question 2

Would your COPY commands include the id column and if so, should those override the default? Since the parent has GENERATED ALWAYS I would assume you want user values to be discarded or an exception raised? (But that's not what your current solution does.)

Question 3

@erwinbrandstetter that is correct. Preferably the COPY would not provide ID values. Best case, an error is thrown if ID is supplied accidentally, but I can live with convention if I have to (i.e. we just "know" not to supply ID values).

Question 4

I don't know of a better solution in general. A few minor things, though:

`pg_get_serial_sequence()`

If you don't know the name of the parent's implicit sequence, use pg_get_serial_sequence().

SELECT pg_get_serial_sequence('data.log', 'id');

You might even use the expression in the CREATE TABLE script directly, but that would impose a very minor additional cost to compute the actual name for the default (once per transaction, I think), and since this is about performance optimization ...

`COPY` overrides `GENERATED ALWAYS`, but triggers do not

Defining your id column as GENERATED ALWAYS AS IDENTITY has the effect that you are not allowed to provide user values for the column id in INSERT statements, unless adding an "override" clause like:

INSERT INTO data.log (epoch_millis, message) OVERRIDING USER VALUE
VALUES (1000000, 'hello');

The manual:

OVERRIDING USER VALUE

If this clause is specified, then any values supplied for identity columns are ignored and the default sequence-generated values are applied.

This clause is useful for example when copying values between tables. Writing INSERT INTO tbl2 OVERRIDING USER VALUE SELECT * FROM tbl1 will copy from tbl1 all columns that are not identity columns in tbl2 while values for the identity columns in tbl2 will be generated by the sequences associated with tbl2.

COPY overrides in any case. The manual:

For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.

But while writing to a partition directly with your solution, INSERT also overrides, so it will be your responsibility to avoid providing user values for the id column directly. An alternative would be to use a trigger instead of the default value in the partition:

CREATE OR REPLACE FUNCTION trg_log_default_id()
 RETURNS trigger
 LANGUAGE plpgsql AS
$func$
BEGIN
 NEW.id := nextval('data.log_id_seq')
 RETURN NEW;
END
$func$;
CREATE TRIGGER insbef_default_id
 BEFORE INSERT ON data.foo_log -- the partition
 FOR EACH ROW
 EXECUTE PROCEDURE trg_log_default_id();

This assigns a number from the sequence in any case, more closely emulating the GENERATED ALWAYS behavior of the parent - stricter, even, also preventing COPY from violating your rule. The manual:

COPY FROM will invoke any triggers and check constraints on the destination table.

A trigger is a bit more expensive than a plain default value. And it burns an extra serial number per row for regular inserts via the parent table. (It should be possible to distinguish cases in the trigger, didn't try now.)

Question 5

As always your answers I see are thorough. I like the idea of using pg_get_serial_sequence perhaps combined with information_schema.columns to format SQL creating partitions with DEFAULT nextval column constraints. The trigger is intriguing but I'm avoiding them on this particular table to get the best write speed. Given this data is in a trusted zone, I think we can live with a little convention for not providing ID values in our INSERT/COPY statements.

Question 6

@akagixxer: Yes, probably the way to go.

score 6 · Accepted Answer · 2018-12-19 16:52:39Z

I don't know of a better solution in general. A few minor things, though:

`pg_get_serial_sequence()`

If you don't know the name of the parent's implicit sequence, use pg_get_serial_sequence().

SELECT pg_get_serial_sequence('data.log', 'id');

You might even use the expression in the CREATE TABLE script directly, but that would impose a very minor additional cost to compute the actual name for the default (once per transaction, I think), and since this is about performance optimization ...

`COPY` overrides `GENERATED ALWAYS`, but triggers do not

Defining your id column as GENERATED ALWAYS AS IDENTITY has the effect that you are not allowed to provide user values for the column id in INSERT statements, unless adding an "override" clause like:

INSERT INTO data.log (epoch_millis, message) OVERRIDING USER VALUE
VALUES (1000000, 'hello');

The manual:

OVERRIDING USER VALUE

If this clause is specified, then any values supplied for identity columns are ignored and the default sequence-generated values are applied.

This clause is useful for example when copying values between tables. Writing INSERT INTO tbl2 OVERRIDING USER VALUE SELECT * FROM tbl1 will copy from tbl1 all columns that are not identity columns in tbl2 while values for the identity columns in tbl2 will be generated by the sequences associated with tbl2.

COPY overrides in any case. The manual:

For identity columns, the COPY FROM command will always write the column values provided in the input data, like the INSERT option OVERRIDING SYSTEM VALUE.

But while writing to a partition directly with your solution, INSERT also overrides, so it will be your responsibility to avoid providing user values for the id column directly. An alternative would be to use a trigger instead of the default value in the partition:

CREATE OR REPLACE FUNCTION trg_log_default_id()
 RETURNS trigger
 LANGUAGE plpgsql AS
$func$
BEGIN
 NEW.id := nextval('data.log_id_seq')
 RETURN NEW;
END
$func$;
CREATE TRIGGER insbef_default_id
 BEFORE INSERT ON data.foo_log -- the partition
 FOR EACH ROW
 EXECUTE PROCEDURE trg_log_default_id();

This assigns a number from the sequence in any case, more closely emulating the GENERATED ALWAYS behavior of the parent - stricter, even, also preventing COPY from violating your rule. The manual:

COPY FROM will invoke any triggers and check constraints on the destination table.

A trigger is a bit more expensive than a plain default value. And it burns an extra serial number per row for regular inserts via the parent table. (It should be possible to distinguish cases in the trigger, didn't try now.)

As always your answers I see are thorough. I like the idea of using pg_get_serial_sequence perhaps combined with information_schema.columns to format SQL creating partitions with DEFAULT nextval column constraints. The trigger is intriguing but I'm avoiding them on this particular table to get the best write speed. Given this data is in a trusted zone, I think we can live with a little convention for not providing ID values in our INSERT/COPY statements.

Stack Exchange Network

PostgreSQL How to DEFAULT Partitioned Identity Column?

1 Answer 1

`pg_get_serial_sequence()`

`COPY` overrides `GENERATED ALWAYS`, but triggers do not

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

PostgreSQL How to DEFAULT Partitioned Identity Column?

1 Answer 1

pg_get_serial_sequence()

COPY overrides GENERATED ALWAYS, but triggers do not

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions

`pg_get_serial_sequence()`

`COPY` overrides `GENERATED ALWAYS`, but triggers do not