Tuesday, April 2, 2019

Statement level load balancing

In the previous article I wrote about one of the new features of upcoming Pgpool-II 4.1.
This time I would like to introduce "statement level load balancing" feature of 4.1.

Pgpool-II can distribute read queries among PostgreSQL backend nodes. This allows to design a scale out cluster using PostgreSQL. The particular database node used for distributing read query is determined at the session level: when a client connects to Pgpool-II. This is so called "session level load balancing".

 For example, if a client connects to Pgpool-II and the load balance node is node 1 (we assume that this is a streaming replication standby node), then any read query will be sent to the load balance node (in this case node1, the standby node). Different session may choose master  (primary) as the load balance node. The distribution ratio of the load balancing node selection in different sessions is determined by "backend weight" parameter in the Pgpool-II configuration file (usually named "pgpool.conf"), typically "backend_weight0" or "backend_weight1", corresponding to node 0 and node 1 respectively.

This is good as long as clients connects to Pgpool-II, issue some queries, and disconnect, since next time a client connects to Pgpool-II, different load balance node may be chosen according to the backend weight parameters.

However, if your client already has a connection pooling feature, this way (session level load balancing) might be a problem, since the selection of load balance node is performed only once when the connection pooling from client to Pgpool-II is created.

The statement level load balancing feature is created to solve the problem. Unlike the session level load balancing, the load balancing node is determined when a new query is issued. The new parameter for this is "statement_level_load_balance". If this is set to on, the feature is enabled (the parameter can be changed by reloading the pgpool.conf).

At first "select_cnt" is 0, which means no SELECTs were issued.

test=# show pool_nodes;

 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change 
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | /tmp     | 11002 | up     | 0.500000  | primary | 0          | true              | 0                 | 2019-04-02 15:36:58
 1       | /tmp     | 11003 | up     | 0.500000  | standby | 0          | false             | 0                 | 2019-04-02 15:36:58
(2 rows)

Let's issue a SELECT.

test=# select 1;
?column? 
----------
        1
(1 row)
test=# show pool_nodes;
 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change  
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | /tmp     | 11002 | up     | 0.500000  | primary | 1          | true              | 0                 | 2019-04-02 15:36:58
 1       | /tmp     | 11003 | up     | 0.500000  | standby | 0          | false             | 0                 | 2019-04-02 15:36:58
(2 rows)

Now the select_cnt of node 0 is 1, which means the SELECT was sent to node 0. Also please note that "load_balance_node" colum of node 0 is "true", which means node 0 is chosen as the load balance node in the last query.

Ok, let's issue another SELECT:

test=# select 2;
 ?column? 
----------
        2
(1 row)

test=# show pool_nodes;
 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node | replication_delay | last_status_change  
---------+----------+-------+--------+-----------+---------+------------+-------------------+-------------------+---------------------
 0       | /tmp     | 11002 | up     | 0.500000  | primary | 1          | false             | 0                 | 2019-04-02 15:36:58
 1       | /tmp     | 11003 | up     | 0.500000  | standby | 1          | true              | 0                 | 2019-04-02 15:36:58
(2 rows)

Now the load_balance_node is changed to node 1, and the select_cnt of node 1 becomes 1. This is how the statement level load balancing works.

13 comments:

  1. This comment has been removed by a blog administrator.

    ReplyDelete
    Replies
    1. This comment has been removed by a blog administrator.

      Delete
  2. This comment has been removed by a blog administrator.

    ReplyDelete
  3. This comment has been removed by a blog administrator.

    ReplyDelete
  4. This comment has been removed by a blog administrator.

    ReplyDelete
  5. This comment has been removed by a blog administrator.

    ReplyDelete
  6. This comment has been removed by a blog administrator.

    ReplyDelete
  7. This comment has been removed by a blog administrator.

    ReplyDelete
  8. This comment has been removed by a blog administrator.

    ReplyDelete
  9. This comment has been removed by a blog administrator.

    ReplyDelete
  10. Hi,

    If Client calls one PostgreSQL Function which contains both select and update statements, In this case Select will redirect to Read-only Database node and update statements will redirect to Read Write Database node? Kindly help on this as my client modules will call many functions which contains select/update/delete statements.

    ReplyDelete
  11. If a function includes any writes (update, delete or insert), this should be marked as "volatile" (https://www.postgresql.org/docs/13/xfunc-volatility.html).
    If pgpool finds a volatile function in an SQL statement, pgpool will redirect the whole SQL statement to primary node.
    i.e. pgpool will not redirect each statement in a function to both primary and standby.

    ReplyDelete
    Replies
    1. Thank you very much for your support. It helps me to take decision on ReadWrite/ReadOnly DB's load balancing.

      Delete

Failover triggered by PostgreSQL shutdown

  Photo by brisch27 Ultra fast PostgreSQL down detection If a user connects to Pgpool-II and one of PostgreSQL servers is stopped by admin, ...