I have a table where the natural identifier is a text like this: AB-C123456E-F
, which is my primary key.
The table is approaching 200 million rows, and so to get better performance on the unique constraint I've split it up into partitions by range:
CREATE TABLE documents (
doc_number text PRIMARY KEY,
doc_data bytea,
doc_data_sha256 bytea
) PARTITION BY RANGE (doc_number text_pattern_ops);
CREATE TABLE documents_a_abc123z PARTITION OF documents FOR VALUES FROM (MINVALUE) TO ('AB-C123z');
CREATE TABLE documents_abc124_def456z PARTITION OF documents FOR VALUES FROM ('AB-C124') TO ('DE-F456z');
...
These rows are queried by either equality or prefix match of the document number, and while partition pruning works as expected with an equality match, I can't get Postgres to do pruning for a prefix match. I've tried using SELECT * FROM documents WHERE doc_number LIKE 'AB-C12345%'
and SELECT * FROM documents WHERE starts_with(doc_number, 'AB-C12345')
.
Is there a way I can get partition pruning over range partitions with a text prefix?
1 Answer 1
You could create the table like this:
CREATE TABLE documents (
doc_number varchar(10) COLLATE "C" NOT NULL,
num_prefix varchar(7) COLLATE "C" NOT NULL,
...,0
PRIMARY KEY (doc_number, num_prefix)
) PARTITION BY RANGE (num_prefix);
Then write a BEFORE
trigger that makes sure that num_prefix
is always the prefix of doc_number
(using a generated column is not allowed for the partitioning key):
CREATE FUNCTION set_id_part() RETURNS trigger
LANGUAGE plpgsql AS
$$BEGIN
NEW.num_prefix := left(NEW.doc_number, 7);
RETURN NEW;
END;$$;
CREATE TRIGGER set_id_part BEFORE INSERT OR UPDATE ON documents
FOR EACH ROW EXECUTE FUNCTION set_id_part();
The primary key has to contain the partitioning key in a partitioned table, but that does not detract from the uniqueness of doc_number
(only the index is somewhat larger).
Now, to get partition pruning, all you have to do is add an additional expression to the query:
SELECT *
FROM documents
WHERE doc_number LIKE 'AB-C12345%'
AND num_prefix = 'AB-C12345';
Not perfect, but I guess that's the best you can get.
where doc_number >= '...' and doc_number < '...'
explain (analyze)
PARTITION BY RANGE (left(doc_number, 7)
and then useleft(doc_number,7) = 'AB-C123'
in addition to the actualLIKE
condition in the query. However you can no longer defineddoc_number
as the primary key