Pgpool-II 4.2 is under developmentPgpool-II developers have been working hard for upcoming Pgpool-II 4.2 release, which is supposed to be released around this September or October.
Snapshot isolation mode is comingFor the new release I have just added new clustering mode called "Snapshot isolation mode", which would be not only interesting for Pgpool-II users but PostgreSQL users and I think I decided to write about it in this blog.
Existing clustering modeUntil Pgpool-II 4.1, we have following clustering mode:
- Streaming replication mode. This is most widely used at this point. PostgreSQL's streaming replication is used for the replication task in this mode.
- Native replication mode. Until streaming replication is available, this was the most common setting. In this mode Pgpool-II issues same data modifying statements (DML/DDL) to all the PostgreSQL servers. As a result same data is replicated in all PostgreSQL servers.
- Slony mode. In this mode replication is done by Slony-I instead of streaming replication. This had been popular until streaming replication came up.
- Raw mode. No replication is done in this mode. Rarely used.
Atomic visibilityBefore explaining that I would like to talk about the background a little bit.
PostgreSQL developers has been discussing about technology which can scale to handle large data. One of such a technical directions is extending FDW (Foreign Data Wrapper). It sends query to another PostgreSQL server to replicate or shard data. The idea is similar to the native replication mode in Pgpool-II because it also sends DML/DDL to servers to replicate data.
Suppose we have 2 sessions: session 1 continuously executes UPDATE t1 SET i = i +1; session 2 continuously executes INSERT INTO log SELECT * FROM t1. Table t1 has only 1 row and it's initial value is 0 (actually each UPDATE and INSERT are executed in an explicit transaction).
So after session 1 and session 2 end, table "log" should have exactly same sequential numbers like "0, 1, 2, 3...." on both PostgreSQL servers. But see the example session below:
So after the sessions end, we will see different sequences of numbers in log table on server1 and server2. If we do not have atomic visibility this kind of data inconsistency could happen. Atomic visibility guarantees that value can be seen on server1 and server2 are same as long as they are executed in a same session. In the example above, data logged into log table would be either 0 (t1 has not been updated/committed yet) or 1 (t1 has been already updated/committed).
Snapshot isolation mode guarantees atomic visibilityThe upcoming snapshot isolation mode is different from existing native replication mode in that it guarantees atomic visibility. So we don't need to worry about such that data inconsistency in the example above. In fact we have new regression to test the new snapshot isolation mode by using essentially same SQL used in the example above. Please note that in order to use snapshot isolation mode, the transaction isolation mode must be "REPEATABLE READ". (Postgres FDW also runs foreign transactions in this mode).
How to achieve atomic visibility?PostgreSQL developers already recognize the necessity of atomic visibility. Postgres-XC, a fork of PostgreSQL, invented Global Transaction Manager to achieve atomic visibility. Some PostgreSQL developers are proposing CSN (Commit Sequence Number) approach. However these approch needs major surgery in PostgreSQL code.
Pgpool-II uses completely different approach, which is called "Pangea" proposed in an academic paper. With Pangea, no modification is necessary with PostgreSQL. Even you can use older versions of PostgreSQL like 8.0. So you can enjoy snapshot isolation mode of Pgpool-II today with your favorite version of PostgreSQL.
In Pagea the idea is called Global Snapshot Isolation, which extends snapshot isolation of local server to across servers. That's the reason why we call the new clustering mode as "Snapshot Isolation mode".