Avoiding connection failure while health checking
Hi pgpool-II users,
I've managed to come back to this blog after 3 years.
Last week pgpool-II developer team released minor versions from 3.1 to 3.5. This release includes a special gift for users: enhancement for health checking.
You might notice an annoying behavior of pgpool-II.
Example: suppose we have three PostgreSQL backends managed by pgpool-II. pgpool-II occasionally checks the health of each backend if "health check" is enabled. If backend #2 goes down, a fail over is triggered and after that, users can use the DB server cluster without backend #2. This is great. However, if you set up the retrying of health checking, clients cannot connect to pgpool-II while it is retrying the health check. For instance,
health_check_max_retries = 10
health_check_retry_delay = 6
will continue the health check retry for 10*6 = 60 seconds at least. This is very annoying for users because they cannot initiate new connections to pgpool-II for 1 minute. Making these parameters shorter might mitigate the situation a little bit but this may not be useful if the network is not stable and longer retries are desirable.
These new releases significantly enhance the situation. By setting:
fail_over_on_backend_error = off
when a user connects to pgpool-II while it is health checking, it starts to connect to all backends including #2. Before this release, pgpool-II gave up initiating session if one of backend is not available (in this case #2). With this release, pgpool-II skips the broken backend and continues to connect to the rest of the backends. Please note, however, this feature is only available when all of the conditions below are met:
- streaming replication mode
- the broken backend is not primary server
- fail_over_on_backend_error is off
This enhancement is available in all of the new releases: 3.5.3, 3.4.7, 3.3.11, 3.2.16, 3.1.19.