Refresh materalized view incrementally in PostgreSQL

Question 1

Is it possible to refresh a materialized view incrementally in PostgreSQL i.e. only for the data that is new or has changed?

Consider this table & materialized view:

CREATE TABLE graph (
 xaxis integer NOT NULL,
 value integer NOT NULL,
);
CREATE MATERIALIZED VIEW graph_avg AS 
SELECT xaxis, AVG(value)
FROM graph
GROUP BY xaxis

Periodically, new values are added to graph or an existing value is updated. I want to refresh the view graph_avg every couple of hours only for the values that have updated. However in PostgreSQL 9.3, the whole table is refreshed. This is quite time consuming. The next version 9.4 allows CONCURRENT update but it still refreshes the entire view. With 100s of millions of rows, this takes a few minutes.

What's a good way to keep track of updated & new values and only refresh the view partially?

Question 2

Still looking for the solution, in my case I have materialized view with multiple joins, and millions records

Question 3

Still interesting to know if there is solution to this

Question 4

Try out an extension pg_ivm for it: stackoverflow.com/questions/29437650/…

Question 5

You can always implement your own table serving as "materialized view". That's how we did it before MATERIALIZED VIEW was implemented in Postgres 9.3.

You can create a plain VIEW:

CREATE VIEW graph_avg_view AS 
SELECT xaxis, AVG(value) AS avg_val
FROM graph
GROUP BY xaxis;

And materialize the result once or whenever you need to start over:

CREATE TABLE graph_avg AS
SELECT * FROM graph_avg_view;

(Or use the SELECT statement directly, without creating a VIEW.)
Then, depending on undisclosed details of your use case, you can DELETE / UPDATE / INSERT changes manually.

A basic DML statement with data-modifying CTEs for your table as is:

Assuming nobody else tries to write to graph_avg concurrently (reading is no problem):

WITH del AS (
 DELETE FROM graph_avg t
 WHERE NOT EXISTS (SELECT FROM graph_avg_view WHERE xaxis = t.xaxis)
 )
, upd AS (
 UPDATE graph_avg t
 SET avg_val = v.avg_val
 FROM graph_avg_view v
 WHERE t.xaxis = v.xaxis
 AND t.avg_val <> v.avg_val
-- AND t.avg_val IS DISTINCT FROM v.avg_val -- alt if avg_val can be NULL
 )
INSERT INTO graph_avg t -- no target list, whole row
SELECT v.*
FROM graph_avg_view v
WHERE NOT EXISTS (SELECT FROM graph_avg WHERE xaxis = v.xaxis);

Basic recipe

Add a timestamp column with default now() to your base table. Let's call it ts.
- If you have updates, add a trigger to set the current timestamp with every update that changes either xaxis or value.

Create a tiny table to remember the timestamp of your latest snapshot. Let's call it mv:

CREATE TABLE mv (
 tbl text PRIMARY KEY
 , ts timestamp NOT NULL DEFAULT '-infinity'
); -- possibly more details

Create this partial, multicolumn index:

CREATE INDEX graph_mv_latest ON graph (xaxis, value)
WHERE ts >= '-infinity';

Use the timestamp of the last snapshot as predicate in your queries to refresh the snapshot with perfect index usage.
At the end of the transaction, drop the index and recreate it with the transaction timestamp replacing the timestamp in the index predicate (initially '-infinity'), which you also save to your table. Everything in one transaction.
Note that the partial index is great to cover INSERT and UPDATE operations, but not DELETE. To cover that, you need to consider the entire table. It all depends on exact requirements.

Question 6

Thank you for the clarity on materialized views and suggesting an alternate answer.

Question 7

Got here when searching for "how to implement a materialized view by hand".

Question 8

Concurrent Update (Postgres 9.4)

While not an incremental update as you asked for, Postgres 9.4 does provide a new concurrent update feature.

To quote the doc...

Prior to PostgreSQL 9.4, refreshing a materialized view meant locking the entire table, and therefore preventing anything querying it, and if a refresh took a long time to acquire the exclusive lock (while it waits for queries using it to finish), it in turn is holding up subsequent queries. This can now been mitigated with the CONCURRENTLY keyword:

 postgres=# REFRESH MATERIALIZED VIEW CONCURRENTLY mv_data;

A unique index will need to exist on the materialized view though. Instead of locking the materialized view up, it instead creates a temporary updated version of it, compares the two versions, then applies INSERTs and DELETEs against the materialized view to apply the difference. This means queries can still use the materialized view while it's being updated. Unlike its non-concurrent form, tuples aren't frozen, and it needs VACUUMing due to the aforementioned DELETEs that will leave dead tuples behind.

This concurrent update is still performing a complete fresh query (not incremental). So CONCURRENTLY does not save on the overall computation time, it just minimizes the amount of time your materialized view is unavailable for use during its update.

Question 9

For a moment I was excited until I read closely. it instead creates a temporary updated version of it...compares the two versions - This means the temporary updated version is still a full computation, then it applies the difference to the existing view. So essentially, I am still re-doing ALL the computations, but just in the temporary table.

Question 10

Ah, true, CONCURRENTLY does not save on the overall computation time, it just minimizes the amount of time your materialized view is unavailable for use during its update.

Question 11

is this still true as of postgres 11 or 12?

score 37 · Accepted Answer · 2014-12-22 18:05:29Z

You can always implement your own table serving as "materialized view". That's how we did it before MATERIALIZED VIEW was implemented in Postgres 9.3.

You can create a plain VIEW:

CREATE VIEW graph_avg_view AS 
SELECT xaxis, AVG(value) AS avg_val
FROM graph
GROUP BY xaxis;

And materialize the result once or whenever you need to start over:

CREATE TABLE graph_avg AS
SELECT * FROM graph_avg_view;

(Or use the SELECT statement directly, without creating a VIEW.)
Then, depending on undisclosed details of your use case, you can DELETE / UPDATE / INSERT changes manually.

A basic DML statement with data-modifying CTEs for your table as is:

Assuming nobody else tries to write to graph_avg concurrently (reading is no problem):

WITH del AS (
 DELETE FROM graph_avg t
 WHERE NOT EXISTS (SELECT FROM graph_avg_view WHERE xaxis = t.xaxis)
 )
, upd AS (
 UPDATE graph_avg t
 SET avg_val = v.avg_val
 FROM graph_avg_view v
 WHERE t.xaxis = v.xaxis
 AND t.avg_val <> v.avg_val
-- AND t.avg_val IS DISTINCT FROM v.avg_val -- alt if avg_val can be NULL
 )
INSERT INTO graph_avg t -- no target list, whole row
SELECT v.*
FROM graph_avg_view v
WHERE NOT EXISTS (SELECT FROM graph_avg WHERE xaxis = v.xaxis);

Basic recipe

Add a timestamp column with default now() to your base table. Let's call it ts.
- If you have updates, add a trigger to set the current timestamp with every update that changes either xaxis or value.

Create a tiny table to remember the timestamp of your latest snapshot. Let's call it mv:

CREATE TABLE mv (
 tbl text PRIMARY KEY
 , ts timestamp NOT NULL DEFAULT '-infinity'
); -- possibly more details

Create this partial, multicolumn index:

CREATE INDEX graph_mv_latest ON graph (xaxis, value)
WHERE ts >= '-infinity';

Use the timestamp of the last snapshot as predicate in your queries to refresh the snapshot with perfect index usage.
At the end of the transaction, drop the index and recreate it with the transaction timestamp replacing the timestamp in the index predicate (initially '-infinity'), which you also save to your table. Everything in one transaction.
Note that the partial index is great to cover INSERT and UPDATE operations, but not DELETE. To cover that, you need to consider the entire table. It all depends on exact requirements.

Thank you for the clarity on materialized views and suggesting an alternate answer.
Got here when searching for "how to implement a materialized view by hand".

Stack Exchange Network

Refresh materalized view incrementally in PostgreSQL

2 Answers 2

Basic recipe

Concurrent Update (Postgres 9.4)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Refresh materalized view incrementally in PostgreSQL

2 Answers 2

Basic recipe

Concurrent Update (Postgres 9.4)

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions