Playing with PostgreSQL and Pgpool: 2022

Wednesday, November 16, 2022

Dynamic spare process management in Pgpool-II

Pre-fork architecture in Pgpool-II

Pgpool-II uses fixed number of pre-forked child process which is responsible for accepting and handling each client connection to Pgpool-II. When a client tries to connect to Pgpool-II, one of pre-forked child process (the number of child process is defined by num_init_children parameter) accepts the request and takes care of the connection.

One of the advantages of this method is avoiding the overhead of spawning process each time when a new connection is requested. However, when the number of pre-forked process is large like over 1,000 and concurrent connections to Pgpool-II is relatively small, most the pre-forked process just wakes up when new connection arrives and wastefully consumes CPU resource.

Dynamic spare process management in Pgpool-II 4.4.

To overcome the issue, "Dynamic spare process management" has been implemented for upcoming Pgpool-II 4.4 (Thanks to Jianshen Zho and Muhammad Usama for implementing this feature). Following new parameters are introduced:

process_management_mode
process_management_strategy
min_spare_children
max_spare_children

If process_management_mode is 'dynamic', the new feature is enabled. If it is set to 'static', the process management is compatible with pre-4.4 style. The default is 'static'.

With process_management_mode is 'dynamic', Pgpool-II dynamically increases or decreases the number of child process according to the demand. The demand is judged by counting number of 'spare' child process. Here, 'spare' means process state which is just waiting for connections from clients. The idea is, dynamically adjusting the number of child process according to the number of spare process.

process_management_strategy can control the aggressiveness of decreasing the number of spare children by choosing one of 'lazy', 'gentle' and 'aggressive'. If 'lazy', it takes 5 minutes before decreasing the spare process. If 'gentle', it takes 2 minutes. Finally if 'aggressive', 50 seconds. The default is 'gentle'.

min_spare_children and max_spare_children controls when pgpool increases the number of spare process. If spare process is less than min_spare_process, pgpool will spawn new process until reaching min_spare_children + 25% of max_spare_children. The defaults for min_spare_children and max_spare_children are 5 and 10 respectively.

`The benchmark`

Ok, so how the feature improves performance?

I did a benchmarking using standard pgbench on a Laptop machine running Ubuntu 20 with 16GB memory and an SSD storage. I used "-C" option of pgbench because the feature should improve the performance while clients connecting to Pgpool-II. If the option is not used, pgbench keeps the connection to Pgpool-II which does not reveal the effect of the feature. The transaction type was read only SELECT (-S) and the number of concurrent connections is 10, which should be small enough comparing with num_init_children = 900. max_connections parameter was 1,000 for each PostgreSQL 15 (I created the environment using pgpool_setup tool to create a 2-node streaming replication cluster). I just changed the process_management_mode parameter to 'dynamic' and 'static' but other parameters were left to the default. I ran pgbench 3 times each with duration 30 seconds and took the median number.

`The result`

The result was dramatic: I see nearly 5 times speed up with process_management_mode parameter is 'dynamic' comparing with the case when it was 'static'.

As you can see, when the process_management_mode is 'dynamic', the TPS (transactions per second) is nearly 5 times larger than the case when it's 'static'.

Conclusion

New feature "dynamic spare process management" improves the connection performance when concurrent connections to Pgpool-II is small comparing with num_init_children. We had to set num_init_children to up to the highest demand of connection request but this sacrifices performance when the concurrent connection demand is low. With the new feature, we don't need to make a compromise.

Monday, September 26, 2022

Query cache improvement in Pgpool-II 4.4

Query cache in Pgpool-II

Pgpool-II has built-in in memory query result cache which may dramatically enhance performance in certain use cases. If your workload is mostly read only SELECT and certain query is frequently used, then you might be able to enjoy the feature. When a SELECT comes to Pgpool-II, it saves the result of the query in shared memory or memcached (you can configure which one to be used). If same query, meaning the query string is absolutely identical to the previous query, then Pgpool-II does not send the query to PostgreSQL. Instead, it fetches the query result from the shared memory (or memcached) and returns to client. Note that in order to this, the user and database must be also identical. In the process following tasks to generate query result are eliminated thus takes less time comparing with standard query processing:

Parsing query
Generating query plan
Executing the query plan
Network round-trip between Pgpool-II and PostgreSQL

Note , however, when a table is modified or dropped, the query cache corresponding to the table is removed. If this frequently happens, query cache may be slower than ordinal query processing because of its overhead. You can monitor the query cache hit ratio by using "SHOW pool_cache" command to make a decision whether you should use the query cache feature or not.

Bottle neck in query cache

The query cache has one weakness until Pgpool-II 4.3. As you can easily guess, the shared memory is heavily accessed by concurrent users. In order to avoid the confusion, Pgpool-II uses strong lock which is called "exclusive" lock. Once a Pgpool-II process acquires the lock, other process have to wait until the process release the lock. As a result when there are many concurrent users, the query cache performance rapidly drops.

Pgpool-II 4.4 will mitigate the problem

Upcoming Pgpool-II 4.4, which is expected to be released by the end of this year, will use "shared lock" in certain cases.

There are two types of query cache access: one is just fetching the query cache, the other is modifying existing query cache data. The latter needs the exclusive lock anyway. But the former just needs shared lock which guarantees the contents of the query cache data is not changed while the lock is held. Since shared lock holders can run concurrently, the performance will be enhanced if most clients just read the query cache data.

The result

Below is a graph showing how the benchmark results are compared. The result was generated by using PostgreSQL's standard benchmark tool "pgbench". The transaction type is read only ("-S"). The database size is default (100k rows). Each pgbench run is 10 seconds. The hardware used was an Ubuntu 20 laptop with 16GB mem and 6 physical CPU cores. two PostgreSQL 15 master head are used and are configured as a two-node streaming replication cluster.

The red line is the master branch head, which will be released as Pgpool-II 4.4. The blue line is Pgpool-II 4.3. The yellow line is the case in 4.3 when query cache is disabled just for a reference.

Until 4 concurrent users there's no big difference between 4.4 and 4.3. But as the number of concurrent users grows, the performance of 4.3 rapidly decreases from 160k to 100k tps at 32 concurrent clients. While 4.4 keeps 140k tps at 32 concurrent users. The difference is about 140%.

Conclusion

The query cache is great for mostly read only workloads. But it had weakness when there are lots of concurrent clients. Upcoming Pgpool-II 4.4 will mitigate the weakness by using shared lock. A benchmark result shows the enhancement is up to 40% at 32 concurrent clients.

Wednesday, May 11, 2022

PostgreSQL supports COPY FREEZE in pgbench but it will not appear in the release note

PostgreSQL 15 will include COPY FREEZE support

In my blog post I reported that a patch to allow to use COPY FREEZE in pgbench was accepted for upcoming PostgreSQL 15 (supposed to be released by the end of 2022). It seems the patch has survived without being reverted!

But it will not be in the release note

Now that PostgreSQL 15 beta1 is coming soon, I checked the proposed release note for PostgreSQL 15 beta1. Unfortunately it does not mention anything about COPY FREEZE support in pgbench. Don't worry. The code for supporting COPY FREEZE in pgbench is surely in PostgreSQL 15. Please try PostgreSQL 15 beta1 once it is out!

Friday, February 11, 2022

What's new in Pgpool-II 4.3 (part3)

In the previous blogs (part1 and part2) I have explained following new features of Pgpool-II 4.3.

Configuration file format has been changed for easier configuration.
More information is added to utility commands to make admin's life easier.
Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes.

I will explain rest of the feature in this blog.

Allow to choose PostgreSQL node for promotion

Before 4.3 it was difficult to promote specified standby PostgreSQL node. Surely we have pcp_promote_node command but it actually just changes the status in Pgpool-II: it does not actually promotes PostgreSQL. Now pcp_promote_node has new option --switchover which actually promote PostgreSQL standby node. Details has been already written in a blog entry. Please visit the blog entry for more details.

Allow to choose the least replication delay node in load balancing

Pgpool-II can distribute read query among PostgreSQL servers. Also Pgpool-II can avoid to send read queries to standby servers if the replication delay between primary PostgreSQL server and standby PostgreSQL servers by using delay_threshold parameter.

New parameter prefer_lower_delay_standby does even more intelligent read query load balancing. This parameter is only useful if you have two or more standby servers. If the parameter is enabled and delay_threshold of the load balance node (that is the node for load balancing when the session started) is greater than 0, Pgpool-II will choose the least delay standby node to route read queries if replication delay exceeds delay_threshold. This will bring more chance to distribute load to standby servers while minimize the replication delay.

Catch up the latest SQL parser of PostgreSQL 14 as usual

Every Pgpool-II major releases import the latest PostgreSQL SQL parser at that point. When Pgpool-II 4.3 was out, PostgreSQL 14 has already been released. So Pgpool-II imports the SQL parser of PostgreSQL 14. With this user can enjoy the latest SQL features include:

Allow an alias to be used to a JOIN ... USING
Allow DISTINCT to be added to GROUP BY
New SEARCH and CYCLE clauses for common table expression

Other new features

Allow to use time stamp with milliseconds.
Add new config parameter failover_on_backend_shutdown. See the past blog entry for more details.
Add a new parameter which represents the recovery source hostname to recovery_1st_stage_command and recovery_2nd_stage_command. Why you need this is explain in the manual:

"Before the hostname of the main (primary) node was obtained by using hostname command. This is mostly ok since the script runs on the main (primary) node anyway. However in some systems the hostname obtained by hostname command is different from the hostname defined in backend_hostname configuration parameter. This could cause a trouble in detach_false_primary because it checks connectivity between primary and standby node by using host parameter in primary_conninfo parameter, which is generated by recovery_1st_stage_command. Thus it is strongly recommended to use this parameter instead of using hostname command to obtain the hostname of the primary node in recovery_1st_stage_command."

Conclusion

This is the last blog entry in the series of blog entries focusing on Pgpool-II 4.3. While Pgpool-II 4.3 has more features, it is easier to configure and use than previous releases. Please try it out!

Sunday, February 6, 2022

What's new in Pgpool-II 4.3 ? (part2)

In the last blog entry I explained what's new in Pgpool-II 4.3. This is the part 2.

Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes

Watchdog is a feature to prevent single point of failure (SPOF) problem of Pgpool-II itself. Since Pgpool-II runs as a proxy between clients and PostgreSQL servers, the failure of Pgpool-II makes entire database system unusable.

Multiple Pgpool-II servers can be deployed to prevent SPOF so that when primary Pgpool-II server goes down, standby Pgpool-II server takes over the role, for example providing an access point as virtual IP address (VIP).

For this purpose each Pgpool-II server monitors each other whether peer Pgpool-II is alive or not. This means that if the communication between the Pgpool-II servers go down, each Pgpool-II thinks the peer Pgpool-II server goes down and only it's self is alive. This situation is called split brain. To avoid the split brain problem 3 or more odd number of Pgpool-II servers can be deployed.

In the figure above, Pgpool-II [A] is disconnected from [B] and [C], while [B] and [C] can communicate with each other. [A] thinks that [B] and [C] go down and "votes" to itself and gets 1 vote. [B] and [C] can vote to themselves and get 2 votes. To hold the majority vote, 2 or more is required. If someone hold the majority vote we say "quorum exists". From the [B] and [C]'s point of view, the quorum exists and the leader Pgpool-II is elected from [B] or [C] and provides the VIP. From [A]'s point of view, the quorum does not exist and VIP is released if it has. We avoid the split brain problem.

Ok, but what if [A] and [B] go down. Of course [C] thinks that the quorum does not exist and VIP is released although there is one Pgpool-II server is running.

New concept "membership"

From Pgpool-II 4.3, new concept membership is introduced. This is to mitigate the issue above. Initially the number of members are equal to the number of Pgpool-II servers. If new parameter wd_remove_shutdown_nodes is enabled, the number of members is decreased when a Pgpool-II server is shutdown by administrator. For example, if [A] and [B] is shutdown by administrator, the number of members is 1 and the quorum is 1. So [C] is the leader of the cluster and VIP is assigned to [C].

wd_lost_node_removal_timeout is another new parameter, which specifies the timeout in seconds before the LOST watchdog node is marked as non member and removed from the cluster. For example, if [A] and [B] is disconnected from [C], from C's point of view the number of members is 1, so the quorum is 1, then C becomes the leader and holds VIP. But in this case there's possibility to have the split brain problem. [A] and [B] might also think that one of them is should be the new leader because the quorum is from their point of view. Great care is needed when you enable wd_lost_node_removal_timeout parameter.

`Next topic will be "`Allow to choose PostgreSQL node for promotion"

I will continue next blog entry with another new feature of 4.3, allowing to choose PostgreSQL node for promotion.

Monday, January 31, 2022

What's new in Pgpool-II 4.3?

Photo by: pixundfertig

Release note and commit log of Pgpool-II 4.3

In this blog I would like to explain some of new features of the latest Pgpool-II major version 4.3, which was released in December 2021.

The official release note is here. Also, if you have the source code of Pgpool-II, you can find the commit id for each items in the release note source code (written in SGML) so that you could look into the commit log for further understanding of the change.

Executive summary of the changes

Here is an executive summary of the changes in 4.3.

Configuration file format has been changed for easier configuration.
More information is added to utility commands to make admin's life easier.
Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes.
Allow to choose PostgreSQL node for promotion.
Allow to choose the least replication delay node in load balancing.
Catch up the latest SQL parser of PostgreSQL 14 as usual.
Others.

I will explain one by one.

Configuration file format has been changed for easier configuration

Pgpool-II used to provide specific configuration file example for each clustering mode. For example, you need to start with pgpool.conf.sample-stream to configure the streaming replication mode. In 4.3, there's only one configuration example file. You just need to set "clustering_mode" parameter for the desired clustering mode to start configuration (of course you still need to add more information. For example, backend node information and assorted account information).

For this, all configuration parameters are commented out and the default values for each configuration parameters are carefully chosen so that users can leave them as they are (commented out). This is similar way to PostgreSQL's configuration file (postgresql.conf).

We advice users who plan to migrate to 4.3 from older version to inspect the default value changes. They are explained in the release note.

More information is added to utility commands to make admin's life easier

Some utility commands are enhanced so that more useful information is shown.

"show pool_nodes" and "pcp_node_info" now show actual PostgreSQL status.

test=# show pool_nodes;
node_id | hostname | port | status | pg_status | lb_weight | role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
0       | /tmp     | 11002 | up     | up        | 0.333333 | primary | primary | 0          | true              | 0                 |                   |                        | 2022-01-31 10:10:35
1       | /tmp     | 11003 | up     | up        | 0.333333 | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2022-01-31 10:10:35
2       | /tmp     | 11004 | down   | up        | 0.333333 | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2022-01-31 10:18:47
(3 rows)

In the example, "pg_status" and "pg_role" are the new fields. You might notice that "status" field of node 2 is "up" while "pg_status" field is "up". This is because the node was detached by pcp_detach_node command. The command just changes the status recognized by Pgpool-II, but does not touch PostgreSQL backend on node 2. Before 4.3 admins have to additionally check PostgreSQL's status whether the backend node was actually down or not because the command just shows the node status as "down" in such cases.

These fields are added to pcp_node_info as well.

$ pcp_node_info -p 11001 -w -a -v
Hostname               : /tmp
Port                   : 11002
Status                 : 1
Weight                 : 0.333333
Status Name            : waiting
Backend Status Name    : up
Role                   : primary
Backend Role           : primary
Replication Delay      : 0
Replication State      : none
Replication Sync State : none
Last Status Change     : 2022-01-31 10:54:26

Hostname               : /tmp
Port                   : 11003
Status                 : 1
Weight                 : 0.333333
Status Name            : waiting
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2022-01-31 10:54:26

Hostname               : /tmp
Port                   : 11004
Status                 : 3
Weight                 : 0.333333
Status Name            : down
Backend Status Name    : up
Role                   : standby
Backend Role           : standby
Replication Delay      : 0
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2022-01-31 10:54:38

pcp_proc_info has some new fields or new information.

"Start time" now has additional information like "(2:52 before process restarting)". This is only shown when "status" field is "Wait for connection", which means the pgpool process is completely idle and can accept new connection. In this case if child_life_time is non 0, the process will restart. The new information indicates how much time remains before the process restarts.

"Backend connection time" is the timestamp when the connection to backend is created. "Client connection time" is the timestamp when the last client connected. "Client idle duration" is the duration in seconds while the client is idle or idle in transaction. Remaining duration before the client gets disconnected due to client_idle_limit (in seconds) . "Client disconnection time" is the timestamp when the last client disconnected to the process. If the client still connects to the process (thus "status" is "Idle" or "Idle in transaction"), the field will be empty.

$ pcp_proc_info -p 11001 --verbose
Database                  : test
Username                  : postgres
Start time                : 2021-09-28 04:16:00 (2:52 before process restarting)
Client connection count   : 1
Major                     : 3
Minor                     : 0
Backend connection time   : 2021-09-28 04:16:16
Client connection time    : 2021-09-28 04:16:16
Client idle duration      : 0
Client disconnection time : 2021-09-28 04:16:33
Pool Counter              : 1
Backend PID               : 30795
Connected                 : 0
PID                       : 30750
Backend ID                : 0
Status                    : Wait for connection

Database                  : test
Username                  : postgres
Start time                : 2021-09-28 04:16:00 (2:52 before process restarting)
Client connection count   : 1
Major                     : 3
Minor                     : 0
Backend connection time   : 2021-09-28 04:16:16
Client connection time    : 2021-09-28 04:16:16
Client idle duration      : 0
Client disconnection time : 2021-09-28 04:16:33
Pool Counter              : 1
Backend PID               : 30796
Connected                 : 0
PID                       : 30750
Backend ID                : 1
Status                    : Wait for connection

Database                  : test
Username                  : kawamoto
Start time                : 2021-09-28 04:16:00
Client connection count   : 0
Major                     : 3
Minor                     : 0
Backend connection time   : 2021-09-28 04:16:03
Client connection time    : 2021-09-28 04:16:03
Client idle duration      : 158 (2:22 before client disconnected)
Client disconnection time :
Pool Counter              : 1
Backend PID               : 30763
Connected                 : 1
PID                       : 30751
Backend ID                : 0
Status                    : Idle

Database                  : test
Username                  : kawamoto
Start time                : 2021-09-28 04:16:00
Client connection count   : 0
Major                     : 3
Minor                     : 0
Backend connection time   : 2021-09-28 04:16:03
Client connection time    : 2021-09-28 04:16:03
Client idle duration      : 158 (2:22 before client disconnected)
Client disconnection time :
Pool Counter              : 1
Backend PID               : 30764
Connected                 : 1
PID                       : 30751
Backend ID                : 1
Status                    : Idle

To be continued...

This blog is already too long. I will continue from "Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes." in my next blog entry.

Wednesday, November 16, 2022

Pre-fork architecture in Pgpool-II

Dynamic spare process management in Pgpool-II 4.4.

The benchmark

The result

Conclusion

Monday, September 26, 2022

Query cache in Pgpool-II

Bottle neck in query cache

Pgpool-II 4.4 will mitigate the problem

The result

Conclusion

Wednesday, May 11, 2022

PostgreSQL 15 will include COPY FREEZE support

But it will not be in the release note

Friday, February 11, 2022

Allow to choose PostgreSQL node for promotion

Allow to choose the least replication delay node in load balancing

Catch up the latest SQL parser of PostgreSQL 14 as usual

Other new features

Conclusion

Sunday, February 6, 2022

Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes

New concept "membership"

Next topic will be "Allow to choose PostgreSQL node for promotion"

Monday, January 31, 2022

Photo by: pixundfertig

Release note and commit log of Pgpool-II 4.3

Executive summary of the changes

Configuration file format has been changed for easier configuration

More information is added to utility commands to make admin's life easier

To be continued...

`The benchmark`

`The result`

`Next topic will be "`Allow to choose PostgreSQL node for promotion"