Friday, February 11, 2022

What's new in Pgpool-II 4.3 (part3)

 


 In the previous blogs (part1 and part2) I have explained following new features of Pgpool-II 4.3.

  • Configuration file format has been changed for easier configuration.
  • More information is added to utility commands to make admin's life easier.
  • Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes.

I will explain rest of the feature in this blog.

 Allow to choose PostgreSQL node for promotion

Before 4.3 it was difficult to promote specified standby PostgreSQL node. Surely we have pcp_promote_node command but it actually just changes the status in Pgpool-II: it does not actually promotes PostgreSQL. Now pcp_promote_node has new option --switchover which actually promote PostgreSQL standby node. Details has been already written in a blog entry. Please visit the blog entry for more details.

Allow to choose the least replication delay node in load balancing

Pgpool-II  can distribute read query among PostgreSQL servers. Also Pgpool-II can avoid to send read queries to standby servers if the replication delay between primary PostgreSQL server and standby PostgreSQL servers by using delay_threshold parameter.

New parameter prefer_lower_delay_standby does even more intelligent read query load balancing. This parameter is only useful if you have two or more standby servers. If the parameter is enabled and delay_threshold of the load balance node (that is the node for load balancing when the session started) is greater than 0, Pgpool-II will choose the least delay standby node to route read queries if replication delay exceeds delay_threshold. This will bring more chance to distribute load to standby servers while minimize the replication delay.

Catch up the latest SQL parser of PostgreSQL 14 as usual

Every Pgpool-II major releases import the latest PostgreSQL SQL parser at that point. When Pgpool-II 4.3 was out, PostgreSQL 14 has already been released. So Pgpool-II imports the SQL parser of PostgreSQL 14. With this user can enjoy the latest SQL features include:

  • Allow an alias to be used to a JOIN ... USING
  • Allow DISTINCT to be added to GROUP BY
  • New SEARCH and CYCLE clauses for common table expression

Other new features

  • Allow to use  time stamp with milliseconds.
  •  Add new config parameter failover_on_backend_shutdown. See the past blog entry for more details.
  • Add a new parameter which represents the recovery source hostname to recovery_1st_stage_command and recovery_2nd_stage_command. Why you need this is explain in the manual: 
"Before the hostname of the main (primary) node was obtained by using hostname command. This is mostly ok since the script runs on the main (primary) node anyway. However in some systems the hostname obtained by hostname command is different from the hostname defined in backend_hostname configuration parameter. This could cause a trouble in detach_false_primary because it checks connectivity between primary and standby node by using host parameter in primary_conninfo parameter, which is generated by recovery_1st_stage_command. Thus it is strongly recommended to use this parameter instead of using hostname command to obtain the hostname of the primary node in recovery_1st_stage_command."

Conclusion

This is the last blog entry in the series of blog entries focusing on Pgpool-II 4.3. While Pgpool-II 4.3  has more features, it is easier to configure and use than previous releases. Please try it out!

Sunday, February 6, 2022

What's new in Pgpool-II 4.3 ? (part2)


 In the last blog entry I explained what's new in Pgpool-II 4.3. This is the part 2.

Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes

Watchdog is a feature to prevent single point of failure (SPOF) problem of Pgpool-II itself. Since Pgpool-II runs  as a proxy between clients and PostgreSQL servers, the failure of Pgpool-II makes entire database system unusable.

 


 Multiple Pgpool-II servers can be deployed to prevent SPOF so that when primary Pgpool-II server goes down, standby Pgpool-II server takes over the role, for example providing an access point as virtual IP address (VIP).

For this purpose each Pgpool-II server monitors each other whether peer Pgpool-II is alive or not. This means that if the communication between the Pgpool-II servers go down, each Pgpool-II thinks the peer Pgpool-II server goes down and only it's self is alive. This situation is called split brain. To avoid the split brain problem 3 or more odd number of Pgpool-II servers can be deployed.

In the figure above, Pgpool-II [A] is disconnected from [B] and [C], while [B] and [C] can communicate with each other. [A] thinks that [B] and [C] go down and "votes" to itself and gets 1 vote. [B] and [C] can vote to themselves and get 2 votes. To hold the majority vote, 2 or more is required. If someone hold the majority vote we say "quorum exists". From the [B] and [C]'s point of view, the quorum exists and the leader Pgpool-II is elected from [B] or [C] and provides the VIP. From [A]'s point of view, the quorum does not exist and VIP is released if it has. We avoid the split brain problem.

Ok, but what if [A] and [B] go down. Of course [C] thinks that the quorum does not exist  and VIP is released although there is one Pgpool-II server is running.

New concept "membership"

From Pgpool-II 4.3, new concept membership  is introduced.  This is to mitigate the issue above. Initially the number of members are equal to the number of Pgpool-II servers. If new parameter wd_remove_shutdown_nodes is enabled, the number of members is decreased when a Pgpool-II server is shutdown by administrator. For example, if [A] and [B] is shutdown by administrator, the number of members is 1 and the quorum is 1. So [C] is the leader of the cluster and VIP is assigned to [C].

 wd_lost_node_removal_timeout is another new parameter, which specifies the timeout in seconds before the LOST watchdog node is marked as non member and removed from the cluster. For example, if [A] and [B] is disconnected from [C], from C's point of view the number of members is 1, so the quorum is 1,  then C becomes the leader and holds VIP. But in this case there's possibility to have the split brain problem.  [A] and [B] might also think that one of them is should be the new leader because the quorum is from their point of view. Great care is needed when you enable wd_lost_node_removal_timeout parameter.

Next topic will be "Allow to choose PostgreSQL node for promotion"

I will continue next blog entry with another new feature of 4.3, allowing to choose PostgreSQL node for promotion.

 

Dynamic spare process management in Pgpool-II

Pre-fork architecture in Pgpool-II Pgpool-II uses fixed number of pre-forked child process which is responsible for accepting and handling e...