How can I finish a "RANGE type" partition in PostgreSQL?

Question 1

I have a 7 TB table in a PostgreSQL 12.3 database. I want to create a "RANGE type" partition on this column. The other thing to note is that this year column is a text field rather than an integer field. There was a special cases that facilitated this decision a decade ago. You will see by the end my confusion about what kind of partition does this end up being, and how can I finish the implementation?

The table is real_estate_sales. The year is the year the home was built. I propose to partition the data in a way that will provide nearly equal sizes of partitions. I realize there are no indexes here, I have left them out for the sake of brevity.

Create the partitioned table. This is the structure of the large home_sales table.

CREATE TABLE home_sales_partitioned (
 "id" int8 NOT NULL DEFAULT nextval('home_sales_id_seq'::regclass), 
 address1 varchar NOT NULL,
 address2 varchar,
 postal_code varchar NOT NULL,
 vehicle_digest varchar NOT NULL,
 year_built varchar NOT NULL
) PARTITION BY RANGE (year);

That is what I want to do. But I found out I couldn't once I tried this:

Create the integer partitions

CREATE TABLE home_sales_1900_30s (CHECK (year ~ E'^\\d+$' AND year::integer >= 1900 AND year::integer < 1940)) INHERITS (home_sales_partitioned);

But I learned that if you are inheriting from a partitioned table, you have to make that table partitioned to. So I changed it to:

Create the partitioned table

CREATE TABLE home_sales_partitioned (
 "id" int8 NOT NULL DEFAULT nextval('home_sales_id_seq'::regclass), 
 address1 varchar NOT NULL,
 address2 varchar,
 postal_code varchar NOT NULL,
 vehicle_digest varchar NOT NULL,
 year varchar NOT NULL
);

And left off the PARTITION BY statement since our only goal is to create the partition framework first. Then I ran:

Then these statements worked:

* Create the integer partitions
CREATE TABLE home_sales_1900_30s (CHECK (year ~ E'^\\d+$' AND year::integer >= 1900 AND year::integer < 1940)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1940_60s (CHECK (year ~ E'^\\d+$' AND year::integer >= 1940 AND year::integer < 1970)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1970s (CHECK (year ~ E'^\\d+$' AND year::integer >= 1970 AND year::integer < 1980)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1980_84 (CHECK (year ~ E'^\\d+$' AND year::integer >= 1980 AND year::integer < 1985)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1985_89 (CHECK (year ~ E'^\\d+$' AND year::integer >= 1985 AND year::integer < 1990)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_1990_94 (CHECK (year ~ E'^\\d+$' AND year::integer >= 1990 AND year::integer < 1995)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_2000_04 (CHECK (year ~ E'^\\d+$' AND year::integer >= 2000 AND year::integer < 2005)) INHERITS (home_sales_partitioned);
CREATE TABLE home_sales_2005_xx (CHECK (year ~ E'^\\d+$' AND year::integer >= 2005)) INHERITS (home_sales_partitioned);

Those all created successfully. Then I created a function to redirect the data on insert:

* Create the partition function
CREATE OR REPLACE FUNCTION year_range_partition(year varchar)
RETURNS text AS $$
BEGIN
 IF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1900 AND year::integer < 1940 THEN
 RETURN 'home_sales_1900_30s';
 ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1940 AND year::integer < 1960 THEN
 RETURN 'home_sales_1940_60s';
 ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1960 AND year::integer < 1970 THEN
 RETURN 'home_sales_1960s';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1970 AND year::integer < 1980 THEN
 RETURN 'home_sales_1970s';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1980 AND year::integer < 1985 THEN
 RETURN 'home_sales_1980_84';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1985 AND year::integer < 1990 THEN
 RETURN 'home_sales_1985_89';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1990 AND year::integer < 1995 THEN
 RETURN 'home_sales_1990_94';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 1995 AND year::integer < 2000 THEN
 RETURN 'home_sales_2000_04';
ELSIF year ~ E'^\\d+$' AND length(year) = 4 AND year::integer >= 2001 AND year::integer < 2005 THEN
 RETURN 'home_sales_2005_xx';
 ELSE
 RETURN 'home_sales_non_integer';
 END IF;
END;
$$ LANGUAGE plpgsql;

And this was created successfully. Now I need to create a function and a trigger on the partition table to route the data successfully upon insert:

* Create the function
CREATE OR REPLACE FUNCTION insert_catalog_entry_fitments_partitioned()
RETURNS TRIGGER AS $$
BEGIN
 INSERT INTO year_range_partition(NEW.year)
 VALUES (NEW.*);
 RETURN NULL;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER insert_home_sales_partitioned_trigger
BEFORE INSERT ON home_sales_partitioned
FOR EACH ROW
EXECUTE FUNCTION insert_home_sales_partitioned();

I am expecting this insert to work correctly. After the insert is complete, I can do:

drop table home_sales;
alter table home_sales_partitioned rename to home_sales.

But now, how to I tell the home_sales table that it is PARTITIONed? What is the command for that? And is this essentially a RANGE partition, or because it's a varchar field, is it some other type? I know that I must need to do this, since SELECTs won't be constructed properly if the table doesn't know it's own partitions.

Question 2

Why are you dabbling with inheritance? That is not how you partition in PostgreSQL, at least since since v10. There is not much sense in trying to get partitions of equal size. Try to partition the table either to support your WHERE conditions or so that you can discard old data with DROP TABLE. There is no way to partition a non-partitioned table except by copying most of the data around, which means a lengthy down time. There are other options, but your question contains too many things at once to be answered in depth.

Question 3

I don't need the inheritance statement. I can just create the tables explicitly. With respect to "There is no way to copy a non-partitioned table except by copying most of the data around which means lengthy downtime", yes, I realize that and am willing to implement it to allow it to scale.

Question 4

I'm willing to bet that it's easier to maintain smaller partitions than larger ones. Also, the reason I'm dabbling in inheritance is that I don't quite understand what I'm doing, hence the post to SO. But I forgot that you already need to know the answer to your own question before you post here any more.

Question 5

Don't be annoyed. It is just quite difficult to understand what the purpose of all your SQL statements is. Why the constraints and the function?

Question 6

Sorry, I've had an outage and it has been a LONG week. I've cobbled this together from different things I've found. The function was to simplify the initial INSERT process. I have found a great (and recent) video that helps me understand about inheritance, about how much of my approach was before v10, etc.

Question 7

I cannot be certain that I understood what you want, but perhaps this can help you. To transform your table into a partitioned table, you have to copy the data (which means a longer down time). Alternatively, from PostgreSQL v13 on, you could use logical replication to minimize the down time.

You could use list partitioning and have a partition per year:

CREATE TABLE home_sales_partitioned (
 id int8 GENERATED ALWAYS AS IDENTITY NOT NULL, 
 address1 text NOT NULL,
 address2 text,
 postal_code text NOT NULL,
 vehicle_digest text NOT NULL,
 year_built text NOT NULL
) PARTITION BY LIST (year);

Then a partition could be create like this:

CREATE TABLE home_sales_partitioned_2020
 PARTITION OF home_sales_partitioned FOR VALUES IN ('2020');

There is no need to create constraints, functions etc.

After you have created all partitions, you can copy the data:

INSERT INTO home_sales_partitioned SELECT * FROM home_sales;

You may then want to create primary keys on the partitions:

ALTER TABLE home_sales_partitioned_2020
 ADD PRIMARY KEY (id);

However, I am not certain whether this partitioning scheme will be beneficial. Yes, it will split the table into smaller chunks, which helps with autovacuum. Partitioning by year could be useful for the following cases:

you want to get rid of all data for a year
you have queries that use year in the WHERE clause of queries that perform a sequential scan
you want to aggregate by year

Question 8

Thanks very kindly Laurenz. This is the answer for sure. We are on v12, but I know we can really benefit from v13 with the features you've outlined above. Autovacuuming these high CRUD tables is currently a problem. And looking at the app code, and understanding what Postgres does with updated rows, I'm going to try to at least take the U out of the CRD. That should cut down on half of the dead tuples in these tables, 2 of which have 12B rows. With respect to your answer above about creating PKs on the partitions, do I also need to create their individual indexes?

Question 9

Well, a DELETE produces just as many dead tuples as an UPDATE. You may benefit from HOT updates if you do it right. You can create indexes on the partitioned table; only unique indexes on a partitioned table have the limitation that they must contain the partitioning key.

Laurenz Albe Laurenz Albe 61.9k4 gold badges57 silver badges93 bronze badges · Accepted Answer · 2023-03-10 20:43:25Z

I cannot be certain that I understood what you want, but perhaps this can help you. To transform your table into a partitioned table, you have to copy the data (which means a longer down time). Alternatively, from PostgreSQL v13 on, you could use logical replication to minimize the down time.

You could use list partitioning and have a partition per year:

CREATE TABLE home_sales_partitioned (
 id int8 GENERATED ALWAYS AS IDENTITY NOT NULL, 
 address1 text NOT NULL,
 address2 text,
 postal_code text NOT NULL,
 vehicle_digest text NOT NULL,
 year_built text NOT NULL
) PARTITION BY LIST (year);

Then a partition could be create like this:

CREATE TABLE home_sales_partitioned_2020
 PARTITION OF home_sales_partitioned FOR VALUES IN ('2020');

There is no need to create constraints, functions etc.

After you have created all partitions, you can copy the data:

INSERT INTO home_sales_partitioned SELECT * FROM home_sales;

You may then want to create primary keys on the partitions:

ALTER TABLE home_sales_partitioned_2020
 ADD PRIMARY KEY (id);

However, I am not certain whether this partitioning scheme will be beneficial. Yes, it will split the table into smaller chunks, which helps with autovacuum. Partitioning by year could be useful for the following cases:

you want to get rid of all data for a year
you have queries that use year in the WHERE clause of queries that perform a sequential scan
you want to aggregate by year

Thanks very kindly Laurenz. This is the answer for sure. We are on v12, but I know we can really benefit from v13 with the features you've outlined above. Autovacuuming these high CRUD tables is currently a problem. And looking at the app code, and understanding what Postgres does with updated rows, I'm going to try to at least take the U out of the CRD. That should cut down on half of the dead tuples in these tables, 2 of which have 12B rows. With respect to your answer above about creating PKs on the partitions, do I also need to create their individual indexes?
Well, a DELETE produces just as many dead tuples as an UPDATE. You may benefit from HOT updates if you do it right. You can create indexes on the partitioned table; only unique indexes on a partitioned table have the limitation that they must contain the partitioning key.

Stack Exchange Network

How can I finish a "RANGE type" partition in PostgreSQL?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How can I finish a "RANGE type" partition in PostgreSQL?

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions