Sunday, July 24, 2016

Avoiding session disconnection while fail over

Your client session to Pgpool-II will be disconnected once fail over or switch over happens. Pretty annoying. This is because Pgpool-II kills all child process that are responsible for each client session. Pgpool-II 3.6 will mitigate this under certain conditions:

  1. Pgpool-II operates in streaming replication mode
  2. The failed DB node is not the primary (master) node
  3. Your "load balance node" is not the failed node
1 & 2 are easy to understand. What about #3?

The load balance node is chosen when you connect to Pgpool-II. Pgpool-II assigns one of the DB nodes to send read only  queries to. It is decided according some of pgpool.conf settings:

  •  load_balance_mode (of course this should be "on")
  • "weight" parameter
  • database_redirect_preference_list
  • app_name_redirect_preference_list

The decision which DB node to choose is done at the early stage of session connection and the assignment will not be changed until you exit the session. From Pgpool-II 3.6, you can check your load balance node by using "show pool_nodes" command.

$  psql -p 11000 test
test=# show pool_nodes;
  node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node
---------+----------+-------+--------+-----------+---------+------------+-------------------
 0       | /tmp     | 11002 | 2      | 0.333333  | primary | 0          | false
 1       | /tmp     | 11003 | 2      | 0.333333  | standby | 0          | false
 2       | /tmp     | 11004 | 2      | 0.333333  | standby | 0          | true
(3 rows)


Here "load_balance_node" is the DB node chosen for the "load balance node".

If other than node 2 is going down and the node is not primary, this session will not be disconnected. In this case the session will not be disconnected if node 1 goes down. Let's try that using another terminal:

$ pg_ctl -D data1 -m f stop
waiting for server to shut down.... done
server stopped


Ok, let's input something in the previous psql session:

test=# show pool_nodes;
 node_id | hostname | port  | status | lb_weight |  role   | select_cnt | load_balance_node
---------+----------+-------+--------+-----------+---------+------------+-------------------
 0       | /tmp     | 11002 | 2      | 0.333333  | primary | 0          | false
 1       | /tmp     | 11003 | 3      | 0.333333  | standby | 0          | false
 2       | /tmp     | 11004 | 2      | 0.333333  | standby | 0          | true
(3 rows)



As you can see, the session was not disconnected and you see the "status" column of node 1 is now changed to "3", which means the node is in down status.

Now let's suppose you want to maintain one of DB nodes. In this case, you could apply following procedure:

  1.  Edit pgpool.conf to change "backend_weight1" parameter to 0. This will prevent new sessions to choose the node 1 as the load balance node.
  2.  Wait until all users who are using node 1 as the load balance node exit session

4 comments:

  1. This is why I always suggest putting PgBouncer in front of pgpool2. If pgpool acted as an actual pooling service, it would proxy and hide the back-end connections. Connections to pgpool are separate from connections to Postgres, so if a failover occurs, it should simply re-bind all connections to the new master, or make new connections as necessary without breaking connections to itself. The fact it doesn't do this pretty much requires the use of a secondary proxy like PgBouncer.

    With PgBouncer in place, it acts as a connection aggregator that hides the fact pgpool is throwing away connections during a failover. This makes the process much more transparent. There may be a query delay during a failover, but no disconnection.

    Of course, you can get the same effect with HAProxy, etcd, and Governor or Patroni. Pgpool is perfectly positioned to handle all of these roles, so the fact that it doesn't is somewhat frustrating. :(

    ReplyDelete
    Replies
    1. Hey Shaun,

      Thanks for your insight into this, I'm working on a project right now using pgpool2 for load balancing, failover with watchdog, and connection pooling. And i'm running into this particular issue when testing failover.

      Do you have pgbouncer just pointing to the pgpool2 service/port and then having pgpool connect into postgresql?

      Thanks!

      Delete
    2. I wonder how PgBouncer re-binds connections without breaking some session local properties. I mean, for example, if a user issued "SET seq_page_cost = 2.0" in a session, and PgBouncer re-binds to another session, then seq_page_cost would be changed to the default value, probably 1.0?

      Delete
  2. This comment has been removed by a blog administrator.

    ReplyDelete