Tuesday, October 30, 2012

pgpool-II + now()

While  I attended PostgreSQL Conference Europe 2012 I noticed some one said "pgpool cannot handle  now()". Of course this is not a correct statement. Pgpool-II can handle now() since 2009 (version 2.3) in any mode.

In master/slave mode(external replication tool such as Slony-I and Streaming replication is responsible for duplicating database), now() is correctly handled as long as those replication tools work correctly.

In replication mode, pgpool-II is responsible for replication. Since pgpool-II is a statement based replication tool,  all DML/DCL statement are sent to all PostgreSQL servers. If this applies to statements using now(), this would be a problem.

Actually pgpool-II solves the  problem by using "query rewriting technique". Consider following INSERT statement:

INSERT INTO t1 VALUES(1, now());

Pgpool-II issues "SELECT now()" to master server to retrieve the timestamp constant. Then rewrite the INSERT by using the constant. Actual rewritten query will be something like this:

INSERT INTO "t1" VALUES (1,"pg_catalog"."timestamptz"('2012-10-30 09:35:43.910192+09'::text));

Therefore each PostgreSQL servers use exactly same timestamp value.

What if now() is using as the table default value? I mean:

CREATE TABLE t1 (int i, timestamp t default now());

and "INSERT INTO t1 VALUES(1)" issued? pgpool-II is smart enough to realize that  now() is used as a default value and complements it:

 INSERT INTO "t1" VALUES (1,'2012-10-30 09:51:22.880077+09');

Note that not only now() but other time/date functions including CURRENT_TIMESTAMP, CURRENT_DATE and CURRENT_TIME are processed like now() as well.

BTW, in the conference I also heard that "random() cannot be handled by pgpool-II". This is true. I think we could handle random() exactly same way as now(). Question is, if it's worth the trouble. If there's enough demand to handle random(), pgpool-II developers will gladly attack the problem.

Wednesday, August 29, 2012

Larger large objects

Large objects (BLOBs) have been there since PostgreSQL was born. The size limit of a large objects has been 2GB(assuming default block size) . Now I decide to expand the limit for PostgreSQL 9.3: 4TB is the target. Actually PostgreSQL backend could hold up to 4TB large objects. It has been just limitation of API: for example, lo_lseek() and lo_tell cannot return over 2GB offset. Those function's return type is "int". So you might wonder why not they cannot return over 2GB value? Well, the secret is frontend/backend protocol for large object.

The underlying protocol is called "fast path interface". It's similar to RPC(Remote Procedure Call). Client sends "Function call" packet along with target function OID(Object Id). The called function within backend is executed and the result is returned back through "Function call response".

The functions called in large object interface are:
  • lo_open
  • lo_close
  • lo_creat
  • lo_create
  • lo_unlink
  • lo_lseek
  • lo_tell
  • lo_truncate
  • loread
  • lowrite
Those functions OIDs are retrieved from backend at the first time when accessing large objects and they are cached in the connection handle(I'm talking about libpq and other interfaces such as JDBC's implementation might be different).

Problem is, lo_lseek and lo_tell as I said earlier. First, their offset parameter is defined 4 bytes long. Second their result length is defined as 4 byte long. So we can handle only up 2^31-1 = 2GB. What shall we do? Well, we will add new function in backend namely, lo_lseek64 and lo_tell64. Libpq will check if those 64-bit functions exist. If yes, then use them. Otherwise (that means backend is likely pre-9.3 version) we use plain old 32-bit limited lo_lseek and lo_tell. This way, we do not break backward compatibility. Of course you need to use 9.3 libpq to enjoy "larger large objects".

I hope I'm going to post the first cut of patch by in September.

Thursday, August 16, 2012

Pgpool-II talk at PostgreSQL Conference Europe 2012

I'm going to give a pgpool-II talk at upcoming PostgreSQL Conference Europe 2012. The talk is titled "Boosting performance and reliability by using pgpool-II" and I will explain how to enhance DB performance by using pgpool-II 3.2's "on memory query cache". Also I will explain how to set up "watchdog", which is also a new feature of pgpool-II 3.2. By using this, you can avoid SPOF(single point of failure) problem of pgpool-II itself without using extra HA software.

The conference will be held in Prague, the Czech Republic, October 23-26. I've never been to Prague, and am excited at this opportunity to visit the old beautiful city!

Saturday, August 4, 2012

pgpool-II 3.2.0 is out!

After 11 month of development, finally pgpool-II 3.2.0 is out! Also a GUI management tool for pgpool-II 3.2.0(and before), pgpool-Admin 3.2.0 is released.

See  http://www.pgpool.net/pgpool-web/NEWS.txt for more details. I would like to thank everyone who have worked for making the release reality.

Also next week(Aug 6, 2012) we are going to release following stable releases:
  • 3.1.4
  • 3.0.8
  • 2.3.4
  • 2.2.8

Sunday, July 22, 2012

pgpool-II 3.2 RC1 is out!

Yesterday the first release candidate of pgpool-II 3.2 was out! Visit and find major featires of pgpool-II 3.2: http://www.pgpool.net/mediawiki/index.php/Main_Page#pgpool-II_3.2 Also tutorials are available. See my previous blog for more details.

Today I would like to discuss on cache hit ratio of on memory query cache. Query cache is not a free lunch: to create a cache entry, extra processing is required. Also, when a table related to the cache entry gets updated, all cache entries related to the table are removed. In summary, to enjoy the benefit of query cache, the cache hit ratio should be high enough. The cache hit ratio can be obtained by show pool_cache query.

test=# \x
Expanded display is on.
test=# show pool_cache;
 show pool_cache;
-[ RECORD 1 ]---------------+---------
num_cache_hits              | 891703
num_selects                 | 99995
cache_hit_ratio             | 0.90
num_hash_entries            | 131072
used_hash_entries           | 99992
num_cache_entries           | 99992
used_cache_enrties_size     | 12482600
free_cache_entries_size     | 54626264
fragment_cache_entries_size | 0

As you can see the cache hit ratio is 90% here. Question is 90% is high enough? Do we need more number? Or less number is enough? So I decided to do some benchmarking to find the right number. I used pgbench -S to generate read query. I started the system from "cold" state, which means cache hit ration = 0, then run read query 1,000 times and see cache hit rate by using show pool_cache command. I repeated this cycle 100 times. Configuration parameters of query cache is all default. The system used is my latop, which has CORE i5 2.6 GHz dual core, 8GB of memory and SSD storage, running Linux kernel 2.6.35.
I also did same pgbench run against no query cache configuration. I got 5979.27TPS in average. My intention is, finding a TPS number exceeding the number, then look into the cache ration, which is the number I am looking for.

Shared memory case

In the first test case , shared memory is used for the cache storage. Here is the graph.
1,000 query run was repeated for 100 times. Blue line is the cache hit ratio. The ratio reached about 0.9(90%) after the 100 runs. Green line is no cache configuration. Red line is the TPS for query cache used. The peak TPS of red line is 52577.62. The red line moved up and down heavily but after 16 times run, it consistently exceeds green line. The cache rate for the 16 times run was 0.52.

Memcached case


In the next test case cache storage is on memcached (memcached is running on the same laptop).
The green line is exactly same as shared memory case. This time the peak TPS of red line was 19393.83. After 24 times run, it consistently exceeds green line. The cache rate for the 24 times run was 0.62.



Shared memory query cache exceeds non cache case performance at the cache hit ratio 52% or higher, while memcached query cache exceeds at 62% or higher. Of course this number will significantly vary depending on the work load. In the benchmark I used pgbench -S, which is just an index lookup and returning 1 row thus one of the most light weight query in rthe eal world. If query is heavier, the cache hit ratio will become lower because the performance of cache hit case will be relatively higher.

In our manual we stat "If the cache_hit_ratio is lower than 70%, you might want to disable on memory cache". 70% is a little bit too conservative, but I think it's not too far from the number I got in the benchmark.

Friday, July 20, 2012

pgpool-II 3.2 tutorials are avilable

pgpool-II 3.2 has not been out yet, but tutorials are already avilable. These early releases are intended to help people who are interested in pgpool-II 3.2 and/or want to participate pgpool-II 3.2 beta testing program. There are two tutorials:
These tutorials are written by Nozomi Anazai who is one of our leading PostgreSQL engineers. She is also one of board members of JPUG(Japan PostgreSQL Users Group). Enjoy!

Wednesday, July 18, 2012

Pgpool-II 3.2 release is getting closer

 I'm back!

Almost one year worth of development, pgpool-II 3.2 release is getting closer. This version has two major new features:
  • On memory query cache
  • Built in HA
On memory query cache is similar to MySQL's query cache.  Caching query SELECT query results by matching literal query string. Unlike MySQL's query cache, you have choice of two types of cache storage: shared memory or memcached. If your server has enough memory, share memory query cache is the best choice because it's faster. If you have separate server which has large memory, memcached might be your choice. If you plan to have multiple pgpool instances, memcached is the only choice. I will explain this feature in another blog entry.

Built in HA is called "watch dog" in pgpool's document. As you might know, pgpool itself could be a SPOF(Single Point Of Failure) because if pgpool goes down, there's no way for applications to access database unless they directly access PostgreSQL. Traditionally we recommend to use "pgpool-HA" to solve the problem. Why we need to have builtin HA then? Well, it is easier for users to use, and this brings another "bonus": better coordination of multiple pgpool instances. I will talk about this in another blog entry.

When will be the 3.2 released? We already released beta2 and plan to release RC1 this week. So we expect to have official release by the end of July 2012.

Please help us in testing pgpool-II 3.2 beta! You can download it from:


Full feature list of 3.2 is here:

Failover triggered by PostgreSQL shutdown

  Photo by brisch27 Ultra fast PostgreSQL down detection If a user connects to Pgpool-II and one of PostgreSQL servers is stopped by admin, ...