I have a 7 TB table in a PostgreSQL 12.3 database. I want to create a "RANGE type" partition on this column. The other thing to note is that this year
column is a text field rather than an integer field. There was a special cases that facilitated this decision a decade ago. You will see by the end my confusion about what kind of partition does this end up being, and how can I finish the implementation?
The table is real_estate_sales. The year is the year the home was built. I propose to partition the data in a way that will provide nearly equal sizes of partitions. I realize there are no indexes here, I have left them out for the sake of brevity.
- Create the partitioned table. This is the structure of the large home_sales table.
CREATE TABLE home_sales_partitioned (
"id" int8 NOT NULL DEFAULT nextval('home_sales_id_seq'::regclass),
address1 varchar NOT NULL,
address2 varchar,
postal_code varchar NOT NULL,
vehicle_digest varchar NOT NULL,
year_built varchar NOT NULL
) PARTITION BY RANGE (year);
That is what I want to do. But I found out I couldn't once I tried this:
- Create the integer partitions
CREATE TABLE home_sales_1900_30s (CHECK (year ~ E'^\\d+$' AND year::integer >= 1900 AND year::integer < 1940)) INHERITS (home_sales_partitioned);
But I learned that if you are inheriting from a partitioned table, you have to make that table partitioned to. So I changed it to:
- Create the partitioned table
CREATE TABLE home_sales_partitioned (
"id" int8 NOT NULL DEFAULT nextval('home_sales_id_seq'::regclass),
address1 varchar NOT NULL,
address2 varchar,
postal_code varchar NOT NULL,
vehicle_digest varchar NOT NULL,
year varchar NOT NULL
);
And left off the PARTITION BY statement since our only goal is to create the partition framework first. Then I ran:
Then these statements worked:
* Create the integer partitions
CREATE TABLE home_sales_1900_30s (CHECK (year ~ E'^\\d+$' AND year::integer >= 1900 AND year::integer < 1940)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1940_60s (CHECK (year ~ E'^\\d+$' AND year::integer >= 1940 AND year::integer < 1970)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1970s (CHECK (year ~ E'^\\d+$' AND year::integer >= 1970 AND year::integer < 1980)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1980_84 (CHECK (year ~ E'^\\d+$' AND year::integer >= 1980 AND year::integer < 1985)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1985_89 (CHECK (year ~ E'^\\d+$' AND year::integer >= 1985 AND year::integer < 1990)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1990_94 (CHECK (year ~ E'^\\d+$' AND year::integer >= 1990 AND year::integer < 1995)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_2000_04 (CHECK (year ~ E'^\\d+$' AND year::integer >= 2000 AND year::integer < 2005)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_2005_xx (CHECK (year ~ E'^\\d+$' AND year::integer >= 2005)) INHERITS (home_sales_partitioned);
Those all created successfully. Then I created a function to redirect the data on insert:
* Create the partition function
CREATE OR REPLACE FUNCTION year_range_partition(year varchar)
RETURNS text AS $$
BEGIN
IF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1900 AND year::integer < 1940 THEN
RETURN 'home_sales_1900_30s';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1940 AND year::integer < 1960 THEN
RETURN 'home_sales_1940_60s';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1960 AND year::integer < 1970 THEN
RETURN 'home_sales_1960s';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1970 AND year::integer < 1980 THEN
RETURN 'home_sales_1970s';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1980 AND year::integer < 1985 THEN
RETURN 'home_sales_1980_84';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1985 AND year::integer < 1990 THEN
RETURN 'home_sales_1985_89';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1990 AND year::integer < 1995 THEN
RETURN 'home_sales_1990_94';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1995 AND year::integer < 2000 THEN
RETURN 'home_sales_2000_04';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 2001 AND year::integer < 2005 THEN
RETURN 'home_sales_2005_xx';
ELSE
RETURN 'home_sales_non_integer';
END IF;
END;
$$ LANGUAGE plpgsql;
And this was created successfully. Now I need to create a function and a trigger on the partition table to route the data successfully upon insert:
* Create the function
CREATE OR REPLACE FUNCTION insert_catalog_entry_fitments_partitioned()
RETURNS TRIGGER AS $$
BEGIN
INSERT INTO year_range_partition(NEW.year)
VALUES (NEW.*);
RETURN NULL;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER insert_home_sales_partitioned_trigger
BEFORE INSERT ON home_sales_partitioned
FOR EACH ROW
EXECUTE FUNCTION insert_home_sales_partitioned();
I am expecting this insert to work correctly. After the insert is complete, I can do:
drop table home_sales;
alter table home_sales_partitioned rename to home_sales.
But now, how to I tell the home_sales table that it is PARTITION
ed? What is the command for that? And is this essentially a RANGE
partition, or because it's a varchar
field, is it some other type? I know that I must need to do this, since SELECT
s won't be constructed properly if the table doesn't know it's own partitions.
1 Answer 1
I cannot be certain that I understood what you want, but perhaps this can help you. To transform your table into a partitioned table, you have to copy the data (which means a longer down time). Alternatively, from PostgreSQL v13 on, you could use logical replication to minimize the down time.
You could use list partitioning and have a partition per year:
CREATE TABLE home_sales_partitioned (
id int8 GENERATED ALWAYS AS IDENTITY NOT NULL,
address1 text NOT NULL,
address2 text,
postal_code text NOT NULL,
vehicle_digest text NOT NULL,
year_built text NOT NULL
) PARTITION BY LIST (year);
Then a partition could be create like this:
CREATE TABLE home_sales_partitioned_2020
PARTITION OF home_sales_partitioned FOR VALUES IN ('2020');
There is no need to create constraints, functions etc.
After you have created all partitions, you can copy the data:
INSERT INTO home_sales_partitioned SELECT * FROM home_sales;
You may then want to create primary keys on the partitions:
ALTER TABLE home_sales_partitioned_2020
ADD PRIMARY KEY (id);
However, I am not certain whether this partitioning scheme will be beneficial. Yes, it will split the table into smaller chunks, which helps with autovacuum. Partitioning by year
could be useful for the following cases:
you want to get rid of all data for a year
you have queries that use
year
in theWHERE
clause of queries that perform a sequential scanyou want to aggregate by year
-
Thanks very kindly Laurenz. This is the answer for sure. We are on v12, but I know we can really benefit from v13 with the features you've outlined above. Autovacuuming these high CRUD tables is currently a problem. And looking at the app code, and understanding what Postgres does with updated rows, I'm going to try to at least take the U out of the CRD. That should cut down on half of the dead tuples in these tables, 2 of which have 12B rows. With respect to your answer above about creating PKs on the partitions, do I also need to create their individual indexes?RubyRedGrapefruit– RubyRedGrapefruit2023年03月11日 12:45:36 +00:00Commented Mar 11, 2023 at 12:45
-
1Well, a
DELETE
produces just as many dead tuples as anUPDATE
. You may benefit from HOT updates if you do it right. You can create indexes on the partitioned table; only unique indexes on a partitioned table have the limitation that they must contain the partitioning key.Laurenz Albe– Laurenz Albe2023年03月12日 12:12:13 +00:00Commented Mar 12, 2023 at 12:12
Explore related questions
See similar questions with these tags.
WHERE
conditions or so that you can discard old data withDROP TABLE
. There is no way to partition a non-partitioned table except by copying most of the data around, which means a lengthy down time. There are other options, but your question contains too many things at once to be answered in depth.