Ultra fast PostgreSQL down detection
If a user connects to Pgpool-II and one of PostgreSQL servers is stopped by admin, Pgpool-II detects it and triggers failover. Since this happens before health check detects the server down, the failover process is pretty fast. This feature looks great at a first glance.
How does Pgpool-II implement it?
How is it possible? When PostgreSQL is shutdown by admin, PostgreSQL sends an error, and it has special error code: 57P01. If Pgpool-II receives the error code, Pgpool-II starts the failover process.
Problem is, there's other case when 57P01 is used. When pg_terminate_backend() is used, or PostgreSQL backend process is killed by signal, exactly the same error code is sent from PostgreSQL. This confuses Pgpool-II and it triggers failover in this case too!
Pgpool-II 4.3 will mitigate the problem
To mitigate the problem upcoming Pgpool-II 4.3, supposed to be released in 2021 fall, will have a new parameter "failover_on_backend_shutdown". If this is on, Pgpool-II will behave exactly as it is now. If it's off, 57P01 does not trigger failover any more. Of course you cannot enjoy the quick failover feature in this case. Sorry but there's no way to have both the quick failover feature and and not to be confused by the error code. So choice is up to you.
I hope someday PostgreSQL becomes clever enough to use distinct error codes for the admin shutdown case and the pg_terminate_backend() case.
This has been committed today.
ReplyDeletehttps://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=10ef9092e2e6a40650fe304270cea79ed20ece17