Sunday, July 22, 2012

pgpool-II 3.2 RC1 is out!

Yesterday the first release candidate of pgpool-II 3.2 was out! Visit and find major featires of pgpool-II 3.2: http://www.pgpool.net/mediawiki/index.php/Main_Page#pgpool-II_3.2 Also tutorials are available. See my previous blog for more details.

Today I would like to discuss on cache hit ratio of on memory query cache. Query cache is not a free lunch: to create a cache entry, extra processing is required. Also, when a table related to the cache entry gets updated, all cache entries related to the table are removed. In summary, to enjoy the benefit of query cache, the cache hit ratio should be high enough. The cache hit ratio can be obtained by show pool_cache query.

test=# \x
\x
Expanded display is on.
test=# show pool_cache;
 show pool_cache;
-[ RECORD 1 ]---------------+---------
num_cache_hits              | 891703
num_selects                 | 99995
cache_hit_ratio             | 0.90
num_hash_entries            | 131072
used_hash_entries           | 99992
num_cache_entries           | 99992
used_cache_enrties_size     | 12482600
free_cache_entries_size     | 54626264
fragment_cache_entries_size | 0

As you can see the cache hit ratio is 90% here. Question is 90% is high enough? Do we need more number? Or less number is enough? So I decided to do some benchmarking to find the right number. I used pgbench -S to generate read query. I started the system from "cold" state, which means cache hit ration = 0, then run read query 1,000 times and see cache hit rate by using show pool_cache command. I repeated this cycle 100 times. Configuration parameters of query cache is all default. The system used is my latop, which has CORE i5 2.6 GHz dual core, 8GB of memory and SSD storage, running Linux kernel 2.6.35.
I also did same pgbench run against no query cache configuration. I got 5979.27TPS in average. My intention is, finding a TPS number exceeding the number, then look into the cache ration, which is the number I am looking for.

Shared memory case


In the first test case , shared memory is used for the cache storage. Here is the graph.
1,000 query run was repeated for 100 times. Blue line is the cache hit ratio. The ratio reached about 0.9(90%) after the 100 runs. Green line is no cache configuration. Red line is the TPS for query cache used. The peak TPS of red line is 52577.62. The red line moved up and down heavily but after 16 times run, it consistently exceeds green line. The cache rate for the 16 times run was 0.52.

Memcached case

 

In the next test case cache storage is on memcached (memcached is running on the same laptop).
The green line is exactly same as shared memory case. This time the peak TPS of red line was 19393.83. After 24 times run, it consistently exceeds green line. The cache rate for the 24 times run was 0.62.

Conclusion

 

Shared memory query cache exceeds non cache case performance at the cache hit ratio 52% or higher, while memcached query cache exceeds at 62% or higher. Of course this number will significantly vary depending on the work load. In the benchmark I used pgbench -S, which is just an index lookup and returning 1 row thus one of the most light weight query in rthe eal world. If query is heavier, the cache hit ratio will become lower because the performance of cache hit case will be relatively higher.


In our manual we stat "If the cache_hit_ratio is lower than 70%, you might want to disable on memory cache". 70% is a little bit too conservative, but I think it's not too far from the number I got in the benchmark.

Friday, July 20, 2012

pgpool-II 3.2 tutorials are avilable

pgpool-II 3.2 has not been out yet, but tutorials are already avilable. These early releases are intended to help people who are interested in pgpool-II 3.2 and/or want to participate pgpool-II 3.2 beta testing program. There are two tutorials:
These tutorials are written by Nozomi Anazai who is one of our leading PostgreSQL engineers. She is also one of board members of JPUG(Japan PostgreSQL Users Group). Enjoy!

Wednesday, July 18, 2012

Pgpool-II 3.2 release is getting closer

 I'm back!

Almost one year worth of development, pgpool-II 3.2 release is getting closer. This version has two major new features:
  • On memory query cache
  • Built in HA
On memory query cache is similar to MySQL's query cache.  Caching query SELECT query results by matching literal query string. Unlike MySQL's query cache, you have choice of two types of cache storage: shared memory or memcached. If your server has enough memory, share memory query cache is the best choice because it's faster. If you have separate server which has large memory, memcached might be your choice. If you plan to have multiple pgpool instances, memcached is the only choice. I will explain this feature in another blog entry.

Built in HA is called "watch dog" in pgpool's document. As you might know, pgpool itself could be a SPOF(Single Point Of Failure) because if pgpool goes down, there's no way for applications to access database unless they directly access PostgreSQL. Traditionally we recommend to use "pgpool-HA" to solve the problem. Why we need to have builtin HA then? Well, it is easier for users to use, and this brings another "bonus": better coordination of multiple pgpool instances. I will talk about this in another blog entry.

When will be the 3.2 released? We already released beta2 and plan to release RC1 this week. So we expect to have official release by the end of July 2012.

Please help us in testing pgpool-II 3.2 beta! You can download it from:

http://www.pgpool.net/mediawiki/index.php/Developer_releases

Full feature list of 3.2 is here:
 http://git.postgresql.org/gitweb/?p=pgpool2.git;a=blob;f=NEWS;h=d106507915454a03f3046b419c245d3934392701;hb=1fbd5d7566726c9bb91de138a7c160c7e5f52de0

Row pattern recognition feature for PostgreSQL

What is row pattern recognition feature? Row pattern recognition (RPR) is a feature defined in the SQL standard. It allows to search for a s...