Why does PostgreSQL FDW talk to multiple backend servers for this point query?

Question 1

I've set up a PostgreSQL FDW server with the following table, sharded by user_id over four servers:

CREATE TABLE my_big_table
(
 user_id bigint NOT NULL,
 serial bigint NOT NULL, -- external, incrementing only
 some_object_id bigint NOT NULL,
 timestamp_ns bigint NOT NULL,
 object_type smallint NOT NULL,
 other_type smallint NOT NULL,
 data bytea
) PARTITION BY LIST (mod(user_id, 4)) ;

CREATE SERVER shardA
 FOREIGN DATA WRAPPER postgres_fdw
 OPTIONS (host '192.168.200.11', port '5432', dbname 'postgres', fetch_size '10000');
 .
 .
 .
CREATE SERVER shardD
 FOREIGN DATA WRAPPER postgres_fdw
 OPTIONS (host '192.168.200.14', port '5432', dbname 'postgres', fetch_size '10000');

create foreign table my_big_table_mod4_s0 partition of my_big_table
 for values in (0) server shardA
 OPTIONS (table_name 'my_big_table_mod4_s0');
 .
 .
 .
create foreign table my_big_table_mod4_s3 partition of my_big_table
 for values in (3) server shardD
 OPTIONS (table_name 'my_big_table_mod4_s3');

Given a query for a single user_id, I was hoping for FDW to select a single backend based on simple partition pruning, but explain shows a foreign table scan against all four servers... How can I hint FDW to be smarter?

Question 2

PostgreSQL lacks the insight into the "mod" function that it would need to declare that user_id=97 also implies that mod(user_id,4)=1. If you provide that insight manually, it would likely honor it:

WHERE user_id=1ドル and mod(user_id,4)=mod(1,4ドル)

This has nothing to do with FDW. If all partitions/tables were local, the answer would remain the same.

You could use hash partitioning, then it would generate the insight automatically.

Question 3

Oh wow, thanks for the better answer, very insightful. I'd like to find a way transparent to the application, so that it does not have to know about how data is partitioned (sharded in my case). I tried hash partitioning, but that applies the modulus after the hash operation, which makes me feel a bit uncomfortable, because I'm not in control any longer how exactly the partitioning is done. Do you know a way to supply a null-hashing function or a way to add this constraint 'on the fly'? (e.g. a VIEW perhaps, could that work?)

Question 4

I think I can do something with a custom operator to get stable hash output, as demonstrated here! :-) git.postgresql.org/gitweb/…

Question 5

This is due to the use of an expression in the partitioning by list on the parent table and PosgreSQL can only perform pruning on simple equality checks in list partitioning, as explained in the docs:

Keep the partitioning constraints simple, else the planner may not be able to prove that child tables might not need to be visited. Use simple equality conditions for list partitioning, or simple range tests for range partitioning, as illustrated in the preceding examples. A good rule of thumb is that partitioning constraints should contain only comparisons of the partitioning column(s) to constants using B-tree-indexable operators, because only B-tree-indexable column(s) are allowed in the partition key.

To work around this, here's an idea: instead of PARTITION BY LIST (mod(user_id, 4)), do partition by hash(user_id) and then on the partitions like: FOR VALUES WITH (MODULUS 4, REMAINDER 0) server shardA.

PostgreSQL FDW will then actually prune the partitions as expected.

However, the data will not be distributed by the actual modulus of the user_id value, but based on the modulus of the hash of the value. This may not be acceptable for various reasons (e.g. terabytes of data already distributed in the non-hashed way).

jjanes jjanes 42.4k3 gold badges44 silver badges54 bronze badges · Answer 1 · 2019-07-08 17:55:27Z

2

PostgreSQL lacks the insight into the "mod" function that it would need to declare that user_id=97 also implies that mod(user_id,4)=1. If you provide that insight manually, it would likely honor it:

WHERE user_id=1ドル and mod(user_id,4)=mod(1,4ドル)

This has nothing to do with FDW. If all partitions/tables were local, the answer would remain the same.

You could use hash partitioning, then it would generate the insight automatically.

Share

Improve this answer

answered Jul 8, 2019 at 17:55

jjanes's user avatar

jjanes jjanes

42.4k3 gold badges44 silver badges54 bronze badges

2

Oh wow, thanks for the better answer, very insightful. I'd like to find a way transparent to the application, so that it does not have to know about how data is partitioned (sharded in my case). I tried hash partitioning, but that applies the modulus after the hash operation, which makes me feel a bit uncomfortable, because I'm not in control any longer how exactly the partitioning is done. Do you know a way to supply a null-hashing function or a way to add this constraint 'on the fly'? (e.g. a VIEW perhaps, could that work?)

gertvdijk
– gertvdijk

2019年07月08日 21:27:46 +00:00
Commented Jul 8, 2019 at 21:27
I think I can do something with a custom operator to get stable hash output, as demonstrated here! :-) git.postgresql.org/gitweb/…

gertvdijk
– gertvdijk

2019年07月09日 00:24:13 +00:00
Commented Jul 9, 2019 at 0:24

Add a comment |

gertvdijk gertvdijk 2271 gold badge4 silver badges14 bronze badges · Answer 2 · 2019-07-08 16:45:02Z

This is due to the use of an expression in the partitioning by list on the parent table and PosgreSQL can only perform pruning on simple equality checks in list partitioning, as explained in the docs:

Keep the partitioning constraints simple, else the planner may not be able to prove that child tables might not need to be visited. Use simple equality conditions for list partitioning, or simple range tests for range partitioning, as illustrated in the preceding examples. A good rule of thumb is that partitioning constraints should contain only comparisons of the partitioning column(s) to constants using B-tree-indexable operators, because only B-tree-indexable column(s) are allowed in the partition key.

To work around this, here's an idea: instead of PARTITION BY LIST (mod(user_id, 4)), do partition by hash(user_id) and then on the partitions like: FOR VALUES WITH (MODULUS 4, REMAINDER 0) server shardA.

PostgreSQL FDW will then actually prune the partitions as expected.

However, the data will not be distributed by the actual modulus of the user_id value, but based on the modulus of the hash of the value. This may not be acceptable for various reasons (e.g. terabytes of data already distributed in the non-hashed way).

Stack Exchange Network

Why does PostgreSQL FDW talk to multiple backend servers for this point query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Why does PostgreSQL FDW talk to multiple backend servers for this point query?

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions