I'm struggling to implement class-like hierarchy storage on top of PostgreSQL.
Currently I am creating a single table for each class with descendant tables containing pkey
and fields diff.
Consider a simple case:
This hierarchy would be represented as the following tables.
CREATE TABLE A_c (id uuid CONSTRAINT A_pkey PRIMARY KEY, class_ text, name text);
CREATE TABLE B_c (id uuid CONSTRAINT B_pkey PRIMARY KEY REFERENCES A_c ON DELETE CASCADE, ival integer);
CREATE TABLE C_c (id uuid CONSTRAINT C_pkey PRIMARY KEY REFERENCES A_c ON DELETE CASCADE, dval double precision);
CREATE TABLE D_c (id uuid CONSTRAINT D_pkey PRIMARY KEY REFERENCES C_c ON DELETE CASCADE, min double precision, max double precision);
I do want my tables to be in sync and allow me to read and modify data without knowing all the messy details, hence I create custom updatable views.
I do expect some bulk operations on my views so I resort to RULE
system.
CREATE TABLE A (id uuid, name text);
CREATE RULE "_RETURN" AS ON SELECT TO A DO INSTEAD SELECT id, name FROM A_c;
CREATE RULE "_insert_into_A" AS ON INSERT TO A DO INSTEAD INSERT INTO A_c (id, class_, name) VALUES (NEW.id, 'A', NEW.name);
CREATE TABLE B (id uuid, name text, ival integer);
CREATE RULE "_RETURN" AS ON SELECT TO B DO INSTEAD SELECT a.id, a.name, b.ival FROM B_c b, A a WHERE a.id = b.id;
CREATE RULE "_insert_into_B" AS ON INSERT TO B DO INSTEAD (INSERT INTO A_c (id, class_, name) VALUES (NEW.id, 'B', NEW.name);
INSERT INTO B_c (id, ival) VALUES (NEW.id, NEW.ival));
CREATE TABLE C (id uuid, name text, dval double precision);
CREATE RULE "_RETURN" AS ON SELECT TO C DO INSTEAD SELECT a.id, a.name, c.dval FROM C_c c, A a WHERE a.id = c.id;
CREATE RULE "_insert_into_C" AS ON INSERT TO C DO INSTEAD (INSERT INTO A_c (id, class_, name) VALUES (NEW.id, 'C', NEW.name);
INSERT INTO C_c (id, dval) VALUES (NEW.id, NEW.dval));
CREATE TABLE D (id uuid, name text, dval double precision, min double precision, max double precision);
CREATE RULE "_RETURN" AS ON SELECT TO D DO INSTEAD SELECT c.id, c.name, c.dval, d.min, d.max FROM D_c d, C c WHERE c.id = d.id;
CREATE RULE "_insert_into_D" AS ON INSERT TO D DO INSTEAD (INSERT INTO A_c (id, class_, name) VALUES (NEW.id, 'D', NEW.name);
INSERT INTO C_c (id, dval) VALUES (NEW.id, NEW.dval);
INSERT INTO D_c (id, min, max) VALUES (NEW.id, NEW.min, NEW.max));
At the first glance everything is fine and dandy.
INSERT INTO B (id, name, ival) VALUES ('87a9e442-e1ff-4d92-bf36-756420ad1dfa', 'test', 10);
INSERT INTO C (id, name, dval) VALUES ('407424d9-c709-45e2-a72c-03a9dd6fdc16', 'test2', 3.0);
INSERT INTO D (id, name, dval, min, max) VALUES ('c174594d-e9fe-4e95-b1a3-f8c6a6e83c53', 'test3', 1.0, 0.0, 2.0);
SELECT * FROM D;
id | name | dval | min | max
--------------------------------------+-------+------+-----+-----
c174594d-e9fe-4e95-b1a3-f8c6a6e83c53 | test3 | 1 | 0 | 2
(1 row)
SELECT * FROM C;
id | name | dval
--------------------------------------+-------+------
407424d9-c709-45e2-a72c-03a9dd6fdc16 | test2 | 3
c174594d-e9fe-4e95-b1a3-f8c6a6e83c53 | test3 | 1
(2 rows)
SELECT * FROM B;
id | name | ival
--------------------------------------+------+------
87a9e442-e1ff-4d92-bf36-756420ad1dfa | test | 10
(1 row)
SELECT * FROM A;
id | name
--------------------------------------+-------
87a9e442-e1ff-4d92-bf36-756420ad1dfa | test
407424d9-c709-45e2-a72c-03a9dd6fdc16 | test2
c174594d-e9fe-4e95-b1a3-f8c6a6e83c53 | test3
(3 rows)
At the second glance though, it looks like my insertion rules reevalute NEW
fields in each insert, which means two things:
- Any expensive function be called for each level of class hierarchy.
Any function with nondetermenistic output evaluates multiple times with different values.
INSERT INTO D (id, name, dval, min, max) VALUES (uuid_generate_v4(), 'test', 1.0, 2.0, 3.0); ERROR: insert or update on table "c_c" violates foreign key constraint "c_c_id_fkey" DETAILS: Key (id)=(ac75a666-adac-4f79-85b6-b802199f5c5f) is not present in table "a_c".
I've tried reimplementing INSERT
rules using WITH
, however it is forbidden to use NEW
pseudo relation within WITH
. Another option would be to wrap insertion into an SQL
function, unfortunately function seems to be an optimization barrier, which means that essentialy I receive more or less TRIGGER
performance with function being called for each row.
So the question is. Did I miss any other option to make my implementation work with arbitrary data (literals, function calls, etc...) in INSERT
statements?
1 Answer 1
I've done some more investigation. So basically I'm hitting fundamental RULE
limitation here, which is actualy stated in the documentation since 9.5
version:
Caution
In many cases, tasks that could be performed by rules on INSERT/UPDATE/DELETE are better done with triggers. Triggers are notationally a bit more complicated, but their semantics are much simpler to understand. Rules tend to have surprising results when the original query contains volatile functions: volatile functions may get executed more times than expected in the process of carrying out the rules.
Also, there are some cases that are not supported by these types of rules at all, notably including WITH clauses in the original query and multiple-assignment sub-SELECTs in the SET list of UPDATE queries. This is because copying these constructs into a rule query would result in multiple evaluations of the sub-query, contrary to the express intent of the query's author.
The point is while _RETURN
rule allows only a single query tree, all the other rules INSERT/UPDATE/DELETE
allow splitting a single query tree into a multitude of query trees. However this happens entirely as a query rewrite procedure so any CTE
or any function call will be duplicated for each new query tree.
Overall the only way to go with the suggested implementation is TRIGGER
implementation of all the data modification routines.
The alternate option would be to create a full fields table for each class without any FK
relation between them, which will make INSERT
a single table operation, SELECT
will be a UNION ALL
aggregation of the class table and all of its descendant classes tables and DELETE/UPDATE
will produce a query tree for the class table and each of its descendant class tables. Huge class hierarchies (hundreds) will cause a significant planner strain for DELETE/UPDATE
queries perormed someplace near the root of the class tree.