Saturday, June 22, 2013

pgpool-II and postgres_fdw

PostgreSQL 9.3 beta1 is out. One of pgpool users tried pgpool with postgres_fdw and found interesting thing. So I looked into this to know what actually happened here.

To test postgres_fdw, I created one pgpool installation with replication mode and two PostgreSQL clusters using pgpool_setup. One is running on port 11001. I executed following commands on 11001 PostgreSQL to create a foreign table called "pgpool".

CREATE EXTENSION postgres_fdw;
CREATE SERVER pgpool FOREIGN DATA WRAPPER postgres_fdw OPTIONS (dbname 'test', port '11002');
CREATE USER MAPPING FOR PUBLIC SERVER pgpool;
CREATE FOREIGN TABLE fr1(i INTEGER) SERVER pgpool;

On port 11002 pgpool is running. On port 11000 PostgreSQL is running. To create a real table corresponding to the foreign table "fr1", I executed "CREATE fr1 TABLE(i INTEGER);". Below is the setup I explained above.
The point here is, the foreign table "fr1" accesses the real table "fr1" via pgpool-II. So from the pgpool's point of view, the foreign table is looks like an ordinary client. If I set "client_idle_limit" to, say 5, I saw an interesting thing. The client_idle_limit is a feature to kick out idle clients. If client sends no query within client_idle_limit seconds, the connection from the client to pgpool is forced to disconnected. Here is the session log when client_idle_limit is set to 5.

$ psql -p 11001 test
-- access foreign table fr1
test=# select * from fr1;
 i
---
 1
(1 row)
-- after 6 seconds passed
test=# select * from fr1;
ERROR:  unknown error
CONTEXT:  Remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
test=# select * from fr1;
ERROR:  unknown error
CONTEXT:  Remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
test=# select * from fr1;
ERROR:  unknown error
CONTEXT:  Remote SQL command: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
test=# \q

 This continues until I quit the session and restart a session. What happens here? you can know it from the pgpool log(my comments are written in red color):
Several commands are issued from postgres_fdw:
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: SET search_path = pg_catalog
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: SET timezone = 'UTC'
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: SET datestyle = ISO
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: SET intervalstyle = postgres
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: SET extra_float_digits = 3
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: START TRANSACTION ISOLATION LEVEL REPEATABLE READ
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: Parse: DECLARE c1 CURSOR FOR
SELECT i FROM public.fr1
Looks like postgres_fdw uses extended protocol
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: B message
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: D message
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: Execute: DECLARE c1 CURSOR FOR
SELECT i FROM public.fr1
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: FETCH 100 FROM c1
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: CLOSE c1
2013-06-22 21:01:55 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement: COMMIT TRANSACTION
 After 6 seconds passed, I executed "SELECT * from fr1" again
2013-06-22 21:02:01 LOG:   pid 11209: pool_process_query: child connection forced to terminate due to client_idle_limit(5) reached
Pgpool told that client_idle_limit expires
2013-06-22 21:02:01 LOG:   pid 11209: DB node id: 0 backend pid: 13109 statement:  DISCARD ALL

How to avoid the error?

Because there's no way for pgpool-II to  know the connection from client is from postgres_fdw, the only workaround is disabling client_idle_limit.

Wednesday, June 19, 2013

Creating regression test suite for pgpool-II

Having a regression test suite has been a long standing TODO item. Now that we have pgpool_setup, I started to think about making it real. Today I think I finally made it.

The idea behind it is
  • create required pgpool cluster (pgpool + PostgreSQL instances) by using pgpool_setup
  • each test case is a simple shell script
  • each script returns 0 or 1 according to the test result
Sounds too simple? Yes, I admit. However it works as expected.

$ sh regress.sh
testing 001.load_balance...ok.
testing 002.native_replication...ok.
testing 003.failover...ok.
testing 050.bug58...ok.
testing 051.bug60...ok.
testing 052.do_query...ok.
testing 053.insert_lock_hangs...ok.
out of 7 ok:7 failed:0


For example, "001.load_balance" checks whether load balancing is working against 2 node native replication or streaming replication cluster by using this simple script:

#! /bin/sh
#-------------------------------------------------------------------
# test script for load balancing.
#
source $TESTLIBS
TESTDIR=testdir

for mode in s r
do
    rm -fr $TESTDIR
    mkdir $TESTDIR
    cd $TESTDIR

# create test environment
    echo -n "creating test environment..."
    sh $PGPOOL_SETUP -m $mode -n 2 || exit 1
    echo "done."

    source ./bashrc.ports

    echo "backend_weight0 = 0" >> etc/pgpool.conf
    echo "backend_weight1 = 1" >> etc/pgpool.conf
    echo "black_function_list = 'f1'" >> etc/pgpool.conf

    ./startall

    export PGPORT=$PGPOOL_PORT

    wait_for_pgpool_startup

    psql test <<EOF
CREATE TABLE t1(i INTEGER);
CREATE FUNCTION f1(INTEGER) returns INTEGER AS 'SELECT \$1' LANGUAGE SQL;
SELECT * FROM t1;        -- this load balances
SELECT f1(1);        -- this does not load balance
EOF

# check if simle load balance worked
    fgrep "SELECT * FROM t1;" log/pgpool.log |grep "DB node id: 1">/dev/null 2>&1
    if [ $? != 0 ];then
    # expected result not found
        ./shutdownall
        exit 1
    fi

# check if black function list worked
    fgrep "SELECT f1(1);" log/pgpool.log |grep "DB node id: 0">/dev/null 2>&1
    if [ $? != 0 ];then
    # expected result not found
        ./shutdownall
        exit 1
    fi

    echo "white_function_list = 'f1'" >> etc/pgpool.conf
    echo "black_function_list = ''" >> etc/pgpool.conf

    ./pgpool_reload

    psql test <<EOF
SELECT f1(1);        -- this does load balance
EOF

# check if white function list worked
    fgrep "SELECT f1(1);" log/pgpool.log |grep "DB node id: 1">/dev/null 2>&1
    if [ $? != 0 ];then
    # expected result not found
        ./shutdownall
        exit 1
    fi

    ./shutdownall

    cd ..

done

exit 0

I hope pgpool developers would add more test cases.

Wednesday, June 5, 2013

Setting up everything at one time

While developping pgpool-II 3.3, I created a small script to make developer's life a little bit easier. However, the tool might be useful for PostgreSQL/pgpool-II beginners. So I decided to introduce it in my blog.

Disclaimer: the environment created by the tool below should never be used for production.

The script is called "pgpool_setup".  It creates followings by just one command under current directory. The directory must be empty. PostgreSQL and pgpool binaries must be installed and in your command search path beforehand.
  • PostgreSQL clusters. You can choose "streaming replication mode" or "native replication mode". The former you already know, I assume. The latter is just a set of identical PostgreSQL clusters except the port numbers are different.
  • If you choose streaming replication mode, one primary server and standby servers are ready after the command runs (even with WAL archival is enabled).
  • By default those PostgreSQL cluster port number start from 11000 and continues 11001, 11002 and so on.
  • The port numbers of pgpool and pcp follow.
  • One pgpool server, with failover/online recovery/followmaster command are all set.
  • A test database "test" is created.
  • pcp commands(which are for pgpool administrator) account and password are ready for you.
By running pgpool_setup, you are ready to start pgpool and PostgreSQL with complete settings (at least for testing pupose) on your computer. I use this tool to create multiple groups  of pgpool+PostgreSQL set on my laptop.

So here's a sample session.

$ pgpool_setup
Satrting set up in streaming replication mode
creating startall and shutdownall
creating failover script
creating database cluster /home/t-ishii/work/git.postgresql.org/tmp2/data0...done.
update postgreql.conf
creating pgpool_remote_start
creating basebackup.sh
creating recovery.conf
creating database cluster /home/t-ishii/work/git.postgresql.org/tmp2/data1...done.
update postgreql.conf
creating pgpool_remote_start
creating basebackup.sh
creating recovery.conf
temporarily start data0 cluster to create extensions
temporarily start pgpool-II to create standby nodes
waiting for failover happens...done.
 node_id | hostname | port  | status | lb_weight |  role   
---------+----------+-------+--------+-----------+---------
 0       | /tmp     | 11000 | 2      | 0.500000  | primary
 1       | /tmp     | 11001 | 3      | 0.500000  | standby
(2 rows)

recovery node 1...done.
creating follow master script
Pager usage is off.
 node_id | hostname | port  | status | lb_weight |  role   
---------+----------+-------+--------+-----------+---------
 0       | /tmp     | 11000 | 2      | 0.500000  | primary
 1       | /tmp     | 11001 | 2      | 0.500000  | standby
(2 rows)

shutdown all

pgpool-II setting for streaming replication mode is done.
To start the whole system, use /home/t-ishii/work/git.postgresql.org/tmp2/startall.
To shutdown the whole system, use /home/t-ishii/work/git.postgresql.org/tmp2/shutdownall.
pcp command user name is "t-ishii", password is "t-ishii".
Each PostgreSQL, pgpool-II and pcp port is as follows:
#1 port is 11000
#2 port is 11001
pgpool port is 11002
pcp port is 11003
The info above is in README.port.
$ ls
README.port  bashrc.ports*  data1/  log/              run/          startall*
archivedir/  data0/         etc/    pgpool_setup.log  shutdownall

# So here you can see PostgreSQL cluster 0 as "data0" and cluster 1 as
# "data1".  You can start everything (pgpool and PostgreSQL) by using
# "startall", which is creaeted by the comand.

$ sh startall
waiting for server to start.... done
server started
waiting for server to start.... done
server started

# Make sure primary and standby are running.  "show pool_nodes" shows
# the info. Port 11002 is the port for pgpool.  As you can see
# primary's port is 11000 and standby's port is 11001.
# Status "2" means up and running.

$ psql -p 11002 -c 'show pool_nodes' test
 node_id | hostname | port  | status | lb_weight |  role   
---------+----------+-------+--------+-----------+---------
 0       | /tmp     | 11000 | 2      | 0.500000  | primary
 1       | /tmp     | 11001 | 2      | 0.500000  | standby
(2 rows)

# Now let's make sure that streaming replication is activated.

$ psql -p 11002 test
Pager usage is off.
psql (9.3beta1)
Type "help" for help.

test=# create table t1(i int);
CREATE TABLE
test=# insert into t1 values(1);
INSERT 0 1
test=# select * from t1;
 i 
---
 1
(1 row)

test=# \q

# Connect to standby.
$ psql -p 11001 test
Pager usage is off.
psql (9.3beta1)
Type "help" for help.

test=# select * from t1;
 i 
---
 1
(1 row)

test=# \q

# Ok, "t1" is propagated.

# Next test failover. So shutdown standby.

$ pg_ctl -D data1 -m f stop
waiting for server to shut down.... done
server stopped
$ psql -p 11002 -c 'show pool_nodes' test
 node_id | hostname | port  | status | lb_weight |  role   
---------+----------+-------+--------+-----------+---------
 0       | /tmp     | 11000 | 2      | 0.500000  | primary
 1       | /tmp     | 11001 | 3      | 0.500000  | standby
(2 rows)

# Note that status "3" means down.

# Let's stary standby and make it online.
$ pg_ctl -D data1 start
server starting
$ pcp_attach_node 1 localhost 11003 t-ishii t-ishii 1
$ psql -p 11002 -c 'show pool_nodes' test
 node_id | hostname | port  | status | lb_weight |  role   
---------+----------+-------+--------+-----------+---------
 0       | /tmp     | 11000 | 2      | 0.500000  | primary
 1       | /tmp     | 11001 | 2      | 0.500000  | standby
(2 rows)

# Next test is promotion. Let's stop primary and see if standby promotes.

$ pg_ctl -D data0 -m f stop
waiting for server to shut down.... done
server stopped

# Wait for while to let our failover script and follow master script do the job.

$ psql -p 11002 -c 'show pool_nodes' test
 node_id | hostname | port  | status | lb_weight |  role   
---------+----------+-------+--------+-----------+---------
 0       | /tmp     | 11000 | 2      | 0.500000  | standby
 1       | /tmp     | 11001 | 2      | 0.500000  | primary
(2 rows)

# Ok, now node 0 is standby, and 1 is primary. 
 
pgpool_setup is available in Pgpool-II 3.4 or later.
 

Tuesday, June 4, 2013

pgpool-II 3.3 alpha 1 is out!

We are going to release new version of pgpool-II 3.3 soon. Previous version 3.2 was released in last August. Since then over 12,000 copies are downloaded. 3.2 was very successfull, since 3.1 was only downloaded only 5,000 in almost same period. So we expect 3.3 to be even more successfull than 3.2. So what's new in 3.2 anyway?

Well 3.3 focuses on maturing 3.2: especially "watchdog" part. The watchdog was new in 3.2, allows to prevent a single point of failure of pgpool itself. I know we already have pgpool-HA, but the watchdog does more. It can make more than two pgpool's working together, for example, on line recovery.

In 3.3 the watchdog is much more enhanced:
  • New life check method (“heart beat”) added
  • Redundant heart beat device support
  • Secure protocol for heart beat and other messages
  • Interlocking of fail over/fail back/follow maser script
  • Monitor and auto restart watchdog process if it goes down
So please come and try 3.3 alpha now!

Dynamic spare process management in Pgpool-II

Pre-fork architecture in Pgpool-II Pgpool-II uses fixed number of pre-forked child process which is responsible for accepting and handling e...