Tuesday, August 24, 2021

Failover triggered by PostgreSQL shutdown

 

Photo by brisch27

Ultra fast PostgreSQL down detection

If a user connects to Pgpool-II and one of PostgreSQL servers is stopped by admin, Pgpool-II detects  it and triggers failover. Since this happens before health check detects the server down, the failover process is pretty fast. This feature looks great at a first glance.

How does Pgpool-II implement it?

How is it possible? When PostgreSQL is shutdown by admin, PostgreSQL sends an error, and it has special error code: 57P01. If Pgpool-II receives the error code, Pgpool-II starts the failover process.

Problem is, there's other case when 57P01 is used. When pg_terminate_backend() is used, or PostgreSQL backend process is killed by signal, exactly the same error code is sent from PostgreSQL. This confuses Pgpool-II and it triggers failover in this case too!

Pgpool-II 4.3 will mitigate the problem

To mitigate the problem upcoming Pgpool-II 4.3, supposed to be released in 2021 fall, will have a new parameter "failover_on_backend_shutdown". If this is on, Pgpool-II will behave exactly as it is now. If it's off, 57P01 does not trigger failover any more. Of course you cannot enjoy the quick failover feature in this case. Sorry but there's no way to have both the quick failover feature and and not to be confused by the error code. So choice is up to you.

I hope someday PostgreSQL becomes clever enough to use distinct error codes for the admin shutdown case and the pg_terminate_backend() case.

1 comment:

  1. This has been committed today.
    https://git.postgresql.org/gitweb/?p=pgpool2.git;a=commit;h=10ef9092e2e6a40650fe304270cea79ed20ece17

    ReplyDelete

Failover triggered by PostgreSQL shutdown

  Photo by brisch27 Ultra fast PostgreSQL down detection If a user connects to Pgpool-II and one of PostgreSQL servers is stopped by admin, ...