Wednesday, August 7, 2019

Automatically updating materialized views

Materialized views are convenient and efficient way to retrieve information from database. Unlike ordinary views, materialized views save the query result and provide faster access to the data. For example, user can create a simple materialized view containing the number of rows in a table:

 CREATE MATERIALIZED VIEW mv1 AS SELECT count(*) FROM pgbench_accounts;

Obtaining the number of rows using the materialized view mv1 is much faster than  directly accessing number of rows in pgbench_accounts.

test=# SELECT * from v1;
 count 
--------
 100000
(1 row)

Time: 0.810 ms
test=# SELECT count(*) FROM pgbench_accounts;
 count 
--------
 100000
(1 row)

Time: 13.945 ms


However if you delete a row from pgbench_accounts, the number of rows in mv1 is still unchanged. To reflect the change of the base table (in this case pgbench_accounts) , you need to recreate or refresh (this actually recreate the contents of materialize views from scratch), which may take long time.

To overcome the problem, SRA OSS is proposing to add a new feature to existing materialized view  "incremental materialized view maintenance". This works like this.

  • add new syntax to allow incrementally update the materialized view when it is created.
  • install triggers automatically to be attached to the base table(s).
  • when UPDATE/DELETE/INSERT applies on the base tables, the trigger calculate the difference to the rows in the materialized view and update the rows.
These should update the materialized view in a real time manner. The price is the over head of trigger invocation. So frequently updating tables are not best suite for the incremental materialized view maintenance. Here are demonstrations how to use the incrementally updating materialized view.

First create incrementally updating materialized view.

CREATE INCREMENTAL MATERIALIZED view mv2 AS SELECT count(*) FROM pgbench_accounts;

The view initially says there are 100,000 rows.

SELECT * FROM mv2;
 count 
--------
 100000
(1 row)


Delete a row from pgbench_accounts. Now the number of rows is 99999.

DELETE FROM pgbench_accounts WHERE aid = 10;
DELETE 1
SELECT count(*) FROM pgbench_accounts;
 count
-------
 99999
(1 row)


Make sure that mv2 reports the same number of rows.

SELECT * FROM mv2;
 count
-------
 99999
(1 row)


It works!

This is a very simple materialized view. More complex queries, for example inner joins also work.

CREATE INCREMENTAL MATERIALIZED VIEW mv3 AS SELECT a.aid, b.bid, t.tid FROM pgbench_accounts a INNER JOIN pgbench_branches b ON (a.bid = b.bid ) INNER JOIN pgbench_tellers t ON (b.bid = t.bid) WHERE a.aid BETWEEN 1 AND 5;
SELECT 50


SELECT * FROM mv3 LIMIT 5;
 aid | bid | tid
-----+-----+-----
   3 |   1 |  10
   2 |   1 |   1
   2 |   1 |   5
   2 |   1 |   4
   5 |   1 |   3

 (5 rows)

DELETE FROM pgbench_accounts WHERE aid = 2;
DELETE 1


SELECT * FROM mv3 LIMIT 5;
 aid | bid | tid
-----+-----+-----
   3 |   1 |  10
   5 |   1 |   3
   1 |   1 |   1
   5 |   1 |   8
   4 |   1 |   3
(5 rows)


Implementing other types of queries such as outer joins, self joins are in our plan, possibly toward PostgreSQL 13.

Here are some pointers to information regarding our incremental materialized view maintenance project:

11 comments:

  1. Other RDBMS implement this by reading the logs (WAL logs) and implementing the refresh in real time without the penalty of trigger.

    ReplyDelete
  2. Yes, using some kind of logs is one of our future plans. The reason why we use trigger for now is, it's simpler and easier to implement. Also by using triggers, changes will be made to matviews by the time when current transaction gets committed, which cannot be achieve by using WAL logs, because WAL logs never be written until transaction gets committed. So our plan using logs will be for "deferred" update of matviews.

    ReplyDelete
  3. Oracle uses something called Materialized View Logs that is a log created on base tables and used for fast refreshes on commit or on demand.

    I think that a MV with many aggregations spanning many rows in base tables can cause a serious contention problem on transactions. Also, serializable properties should be maintained.

    This feature will be welcome and thanks for your hard work.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. This comment has been removed by a blog administrator.

    ReplyDelete
  11. This comment has been removed by a blog administrator.

    ReplyDelete

Dynamic spare process management in Pgpool-II

Pre-fork architecture in Pgpool-II Pgpool-II uses fixed number of pre-forked child process which is responsible for accepting and handling e...