Wednesday, August 7, 2019

Automatically updating materialized views

Materialized views are convenient and efficient way to retrieve information from database. Unlike ordinary views, materialized views save the query result and provide faster access to the data. For example, user can create a simple materialized view containing the number of rows in a table:

 CREATE MATERIALIZED VIEW mv1 AS SELECT count(*) FROM pgbench_accounts;

Obtaining the number of rows using the materialized view mv1 is much faster than  directly accessing number of rows in pgbench_accounts.

test=# SELECT * from v1;
 count 
--------
 100000
(1 row)

Time: 0.810 ms
test=# SELECT count(*) FROM pgbench_accounts;
 count 
--------
 100000
(1 row)

Time: 13.945 ms


However if you delete a row from pgbench_accounts, the number of rows in mv1 is still unchanged. To reflect the change of the base table (in this case pgbench_accounts) , you need to recreate or refresh (this actually recreate the contents of materialize views from scratch), which may take long time.

To overcome the problem, SRA OSS is proposing to add a new feature to existing materialized view  "incremental materialized view maintenance". This works like this.

  • add new syntax to allow incrementally update the materialized view when it is created.
  • install triggers automatically to be attached to the base table(s).
  • when UPDATE/DELETE/INSERT applies on the base tables, the trigger calculate the difference to the rows in the materialized view and update the rows.
These should update the materialized view in a real time manner. The price is the over head of trigger invocation. So frequently updating tables are not best suite for the incremental materialized view maintenance. Here are demonstrations how to use the incrementally updating materialized view.

First create incrementally updating materialized view.

CREATE INCREMENTAL MATERIALIZED view mv2 AS SELECT count(*) FROM pgbench_accounts;

The view initially says there are 100,000 rows.

SELECT * FROM mv2;
 count 
--------
 100000
(1 row)


Delete a row from pgbench_accounts. Now the number of rows is 99999.

DELETE FROM pgbench_accounts WHERE aid = 10;
DELETE 1
SELECT count(*) FROM pgbench_accounts;
 count
-------
 99999
(1 row)


Make sure that mv2 reports the same number of rows.

SELECT * FROM mv2;
 count
-------
 99999
(1 row)


It works!

This is a very simple materialized view. More complex queries, for example inner joins also work.

CREATE INCREMENTAL MATERIALIZED VIEW mv3 AS SELECT a.aid, b.bid, t.tid FROM pgbench_accounts a INNER JOIN pgbench_branches b ON (a.bid = b.bid ) INNER JOIN pgbench_tellers t ON (b.bid = t.bid) WHERE a.aid BETWEEN 1 AND 5;
SELECT 50


SELECT * FROM mv3 LIMIT 5;
 aid | bid | tid
-----+-----+-----
   3 |   1 |  10
   2 |   1 |   1
   2 |   1 |   5
   2 |   1 |   4
   5 |   1 |   3

 (5 rows)

DELETE FROM pgbench_accounts WHERE aid = 2;
DELETE 1


SELECT * FROM mv3 LIMIT 5;
 aid | bid | tid
-----+-----+-----
   3 |   1 |  10
   5 |   1 |   3
   1 |   1 |   1
   5 |   1 |   8
   4 |   1 |   3
(5 rows)


Implementing other types of queries such as outer joins, self joins are in our plan, possibly toward PostgreSQL 13.

Here are some pointers to information regarding our incremental materialized view maintenance project:

Wednesday, July 24, 2019

When primary server is far away from standby server

Sometimes we want to create a cluster configuration in which the primary server is physically located far away from standby servers. For example, the primary server is located in an AWS region A, while the standby server is located in an AWS region B. This type of configuration is suited for a DR (Disaster Recovery) or a company which has a branch in different countries. In the figure below, a company's head quarter is located in region A,  and people in the HQ office access the primary server database through Pgpool-II.

People in a branch office located in region B access standby database in mostly read only manner. In a few occasion they need to write the database but they understand that it needs to access HQ database which would be slow. Problem for them is, even if they just do a read access to the database, Pgpool-II needs to access system catalog in the primary server to obtain meta info of the tables involved in their query.



From Pgpool-II 4.1,  which is under development, user could eliminate the slow system catalog access in from region B to region A by setting relcache_query_target = load_balance_node and backend weight of primary server to 0 in Pgpool-II running in region B.

Now the system catalog access is redirected to the standby server, rather than to the primary server and the access speed should faster.

Please note that , however, there may be replication delay and that could cause a trouble if the target table is pretty new because the system catalog in the standby server may not the have the new table info. So this configuration is best suitable for a system the table creation is rare, or they are created while users do not access the system.

By the way, Pgpool-II developers define the target date for release 4.1 of Pgpool-II in this October. Please stay tuned.

Tuesday, April 2, 2019

Statement level load balancing

In the previous article I wrote about one of the new features of upcoming Pgpool-II 4.1.
This time I would like to introduce "statement level load balancing" feature of 4.1.

Pgpool-II can distribute read queries among PostgreSQL backend nodes. This allows to design a scale out cluster using PostgreSQL. The particular database node used for distributing read query is determined at the session level: when a client connects to Pgpool-II. This is so called "session level load balancing".

 For example, if a client connects to Pgpool-II and the load balance node is node 1 (we assume that this is a streaming replication standby node), then any read query will be sent to the load balance node (in this case node1, the standby node). Different session may choose master  (primary) as the load balance node. The distribution ratio of the load balancing node selection in different sessions is determined by "backend weight" parameter in the Pgpool-II configuration file (usually named "pgpool.conf"), typically "backend_weight0" or "backend_weight1", corresponding to node 0 and node 1 respectively.

This is good as long as clients connects to Pgpool-II, issue some queries, and disconnect, since next time a client connects to Pgpool-II, different load balance node may be chosen according to the backend weight parameters.

However, if your client already has a connection pooling feature, this way (session level load balancing) might be a problem, since the selection of load balance node is performed only once when the connection pooling from client to Pgpool-II is created.

The statement level load balancing feature is created to solve the problem. Unlike the session level load balancing, the load balancing node is determined when a new query is issued. The new parameter for this is "statement_level_load_balance". If this is set to on, the feature is enabled (the parameter can be changed by reloading the pgpool.conf).

At first "select_cnt" is 0, which means no SELECTs were issued.

test=# show pool_nodes;

 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change 
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | /tmp     | 11002 | up     | 0.500000  | primary | 0          | true              | 0                 | 2019-04-02 15:36:58
 1       | /tmp     | 11003 | up     | 0.500000  | standby | 0          | false             | 0                 | 2019-04-02 15:36:58
(2 rows)

Let's issue a SELECT.

test=# select 1;
?column? 
----------
        1
(1 row)
test=# show pool_nodes;
 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change  
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | /tmp     | 11002 | up     | 0.500000  | primary | 1          | true              | 0                 | 2019-04-02 15:36:58
 1       | /tmp     | 11003 | up     | 0.500000  | standby | 0          | false             | 0                 | 2019-04-02 15:36:58
(2 rows)

Now the select_cnt of node 0 is 1, which means the SELECT was sent to node 0. Also please note that "load_balance_node" colum of node 0 is "true", which means node 0 is chosen as the load balance node in the last query.

Ok, let's issue another SELECT:

test=# select 2;
 ?column? 
----------
        2
(1 row)

test=# show pool_nodes;
 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change  
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | /tmp     | 11002 | up     | 0.500000  | primary | 1          | false             | 0                 | 2019-04-02 15:36:58
 1       | /tmp     | 11003 | up     | 0.500000  | standby | 1          | true              | 0                 | 2019-04-02 15:36:58
(2 rows)

Now the load_balance_node is changed to node 1, and the select_cnt of node 1 becomes 1. This is how the statement level load balancing works.

Sunday, March 24, 2019

Shared Relation Cache

 System catalogs?

Pgpool-II needs to access PostgreSQL's system catalogs whenever it recognizes tables in user's query.  For example, Pgpool-II has to know whether the table in question is a temporary table or not. If it's a temporary table, then the query using the temporary table must be routed to the primary PostgreSQL, rather than one of standby PostgreSQL servers because  PostgreSQL does not allow to create temporary tables on standby servers. Another use case is converting the table name to OID (Object Identifier). OIDs are unique keys for objects managed in PostgreSQL's system catalogs.

Same thing can be said to functions. Details of functions, for instance whether they are "immutable" or not, is important information since it affects the decision on which the query result using the function should be cached or not when query cache feature is enabled.

Local query cache for system catalogs

Sometimes Pgpool-II needs to issue up to as many as 10 queries to the system catalog when it sees a table or function for the first time. Fortunately Pgpool-II does not wast the query results. They are stored in local cache (wee call it "Relation Cache" or "Relcache"), and next time it sees the object in the same or different queries, it extracts info from the local cache . So far so good.

Problem is, the local cache is stored in private memory in Pgpool-II' s child process. For each new session from Pgpool-II clients, different child process is assigned for the session. So even if single table is used in queries, Pgpool-II continues to access system catalogs until the table's info gets filled  in all the local caches.

Shared relation cache

How to mitigate the problem? One of the solutions would be sharing the relation cache info among Pgpool-II processes. This way,  once one of the processes accesses the system catalogs and obtains the info, other processes do not need to access the system catalogs any more. The cache shared by the processes is called "shared relation cache".

How to implement it?

But how to implement it? Fortunately Pgpool-II already has shared query cache. Why can't we store the shared relation cache on it? Here's the idea:
  •  If the table/function info is not in the local relation cache, check the shared relation cache.
  • If it  is not in the shared relation cache, access the system catalogs and store the info in the local cache. Also copy the info to the shared relation cache,
  • If the table/function info is already in the shared relation cache, just copy the info to the local cache. 
You might wonder why there are two kinds of cache: local one and shared one. The reason is locking. Since local cache is never be accessed by multiple processes, it does not need any locking, while the shared relation cache can be accessed by multiple processes, it must be guarded by locking, and this could be a serious bottle neck if there are many processes.

 Cache invalidation

Any cache needs to be invalidated someday.  In the current implementation the cache invalidation is based on timeout. The timeout value can be specified using "relcache_expire" parameter, which controls the local cache timeout as well.

Is it faster?

Is shared relation cache is faster? Well, it depends on use cases.  If there are very small number of tables or functions, overhead of new shared relation cache will not give advantages. However there are many tables/functions, definitely it wins. This is the reason why Pgpool-II has a switch (enabled_shared_relcache) to enable or disable the feature.

To demonstrate the case when the feature wins, I created  1, 2, 4, 8, 16, 32, 64, 128 and 256 tables (all tables are empty), and accessed them using pgbench. pgbench's option is as follows:

pgbench  -C -T 30 -c 30 -n -f script_file

pgbench ran 3 times for each session and I used the average of the numbers.
The script file includes 1-128 SELECTs to access each table.

The blue line (Speed = 1) is the base line, i.e. when the feature is disabled. The red line is when the feature is enabled. As you can see as the number of tables increases, performance increases as well, up to 32 tables. As the number of tables increases, performance is getting lower but still the performance when shared relcache being on is superior than off.

The result may differ according to the workload: if the SELECT is heavy, then the effect may be weaken because the longer access time of SELECT hides the effect of shared relache.

When it will be available?

The feature is already comitted into Pgpool-II version 4.1, which is supposed to be released around September 2019.  So stay tuned!

Snapshot Isolation Mode

Pgpool-II 4.2 is under development Pgpool-II developers have been working hard for upcoming Pgpool-II 4.2 release, which is supposed to b...