Playing with PostgreSQL and Pgpool

Thursday, August 16, 2012

Pgpool-II talk at PostgreSQL Conference Europe 2012

I'm going to give a pgpool-II talk at upcoming PostgreSQL Conference Europe 2012. The talk is titled "Boosting performance and reliability by using pgpool-II" and I will explain how to enhance DB performance by using pgpool-II 3.2's "on memory query cache". Also I will explain how to set up "watchdog", which is also a new feature of pgpool-II 3.2. By using this, you can avoid SPOF(single point of failure) problem of pgpool-II itself without using extra HA software.

The conference will be held in Prague, the Czech Republic, October 23-26. I've never been to Prague, and am excited at this opportunity to visit the old beautiful city!

Saturday, August 4, 2012

pgpool-II 3.2.0 is out!

After 11 month of development, finally pgpool-II 3.2.0 is out! Also a GUI management tool for pgpool-II 3.2.0(and before), pgpool-Admin 3.2.0 is released.

See http://www.pgpool.net/pgpool-web/NEWS.txt for more details. I would like to thank everyone who have worked for making the release reality.

Also next week(Aug 6, 2012) we are going to release following stable releases:

3.1.4
3.0.8
2.3.4
2.2.8

Sunday, July 22, 2012

pgpool-II 3.2 RC1 is out!

Yesterday the first release candidate of pgpool-II 3.2 was out! Visit and find major featires of pgpool-II 3.2: http://www.pgpool.net/mediawiki/index.php/Main_Page#pgpool-II_3.2 Also tutorials are available. See my previous blog for more details.

Today I would like to discuss on cache hit ratio of on memory query cache. Query cache is not a free lunch: to create a cache entry, extra processing is required. Also, when a table related to the cache entry gets updated, all cache entries related to the table are removed. In summary, to enjoy the benefit of query cache, the cache hit ratio should be high enough. The cache hit ratio can be obtained by show pool_cache query.

test=# \x
\x
Expanded display is on.
test=# show pool_cache;

 show pool_cache;
-[ RECORD 1 ]---------------+---------
num_cache_hits              | 891703
num_selects                 | 99995
cache_hit_ratio             | 0.90
num_hash_entries            | 131072
used_hash_entries           | 99992
num_cache_entries           | 99992
used_cache_enrties_size     | 12482600
free_cache_entries_size     | 54626264
fragment_cache_entries_size | 0

As you can see the cache hit ratio is 90% here. Question is 90% is high enough? Do we need more number? Or less number is enough? So I decided to do some benchmarking to find the right number. I used pgbench -S to generate read query. I started the system from "cold" state, which means cache hit ration = 0, then run read query 1,000 times and see cache hit rate by using show pool_cache command. I repeated this cycle 100 times. Configuration parameters of query cache is all default. The system used is my latop, which has CORE i5 2.6 GHz dual core, 8GB of memory and SSD storage, running Linux kernel 2.6.35.
I also did same pgbench run against no query cache configuration. I got 5979.27TPS in average. My intention is, finding a TPS number exceeding the number, then look into the cache ration, which is the number I am looking for.

Shared memory case

In the first test case , shared memory is used for the cache storage. Here is the graph.

1,000 query run was repeated for 100 times. Blue line is the cache hit ratio. The ratio reached about 0.9(90%) after the 100 runs. Green line is no cache configuration. Red line is the TPS for query cache used. The peak TPS of red line is 52577.62. The red line moved up and down heavily but after 16 times run, it consistently exceeds green line. The cache rate for the 16 times run was 0.52.

Memcached case

In the next test case cache storage is on memcached (memcached is running on the same laptop).

The green line is exactly same as shared memory case. This time the peak TPS of red line was 19393.83. After 24 times run, it consistently exceeds green line. The cache rate for the 24 times run was 0.62.

Conclusion

Shared memory query cache exceeds non cache case performance at the cache hit ratio 52% or higher, while memcached query cache exceeds at 62% or higher. Of course this number will significantly vary depending on the work load. In the benchmark I used pgbench -S, which is just an index lookup and returning 1 row thus one of the most light weight query in rthe eal world. If query is heavier, the cache hit ratio will become lower because the performance of cache hit case will be relatively higher.

In our manual we stat "If the cache_hit_ratio is lower than 70%, you might want to disable on memory cache". 70% is a little bit too conservative, but I think it's not too far from the number I got in the benchmark.

Friday, July 20, 2012

pgpool-II 3.2 tutorials are avilable

pgpool-II 3.2 has not been out yet, but tutorials are already avilable. These early releases are intended to help people who are interested in pgpool-II 3.2 and/or want to participate pgpool-II 3.2 beta testing program. There are two tutorials:

http://www.pgpool.net/pgpool-web/contrib_docs/memqcache/en.html You can easily try out pgpool-II 3.2's on memory query cache functionality by using this tutorial. What you need is a desktop/laptop with Linux installed.
http://www.pgpool.net/pgpool-web/contrib_docs/watchdog/en.html In this tutorial you can learn how to use pgpool-II 3.2's built in HA(High Availability) functionality. You need to prepare two laptops/desktops with Linux installed.

These tutorials are written by Nozomi Anazai who is one of our leading PostgreSQL engineers. She is also one of board members of JPUG(Japan PostgreSQL Users Group). Enjoy!

Wednesday, July 18, 2012

Pgpool-II 3.2 release is getting closer

I'm back!

Almost one year worth of development, pgpool-II 3.2 release is getting closer. This version has two major new features:

On memory query cache
Built in HA

On memory query cache is similar to MySQL's query cache. Caching query SELECT query results by matching literal query string. Unlike MySQL's query cache, you have choice of two types of cache storage: shared memory or memcached. If your server has enough memory, share memory query cache is the best choice because it's faster. If you have separate server which has large memory, memcached might be your choice. If you plan to have multiple pgpool instances, memcached is the only choice. I will explain this feature in another blog entry.

Built in HA is called "watch dog" in pgpool's document. As you might know, pgpool itself could be a SPOF(Single Point Of Failure) because if pgpool goes down, there's no way for applications to access database unless they directly access PostgreSQL. Traditionally we recommend to use "pgpool-HA" to solve the problem. Why we need to have builtin HA then? Well, it is easier for users to use, and this brings another "bonus": better coordination of multiple pgpool instances. I will talk about this in another blog entry.

When will be the 3.2 released? We already released beta2 and plan to release RC1 this week. So we expect to have official release by the end of July 2012.

Please help us in testing pgpool-II 3.2 beta! You can download it from:

http://www.pgpool.net/mediawiki/index.php/Developer_releases

Full feature list of 3.2 is here:
http://git.postgresql.org/gitweb/?p=pgpool2.git;a=blob;f=NEWS;h=d106507915454a03f3046b419c245d3934392701;hb=1fbd5d7566726c9bb91de138a7c160c7e5f52de0

Wednesday, June 15, 2011

LPI-Japan to start PostgreSQL certfication

LPI-Japan, a non-profit distributor of LPIC(Linux Professional Institute Certification) in Japan will start "OSS-DB" exam from July 1st, 2011. LPI-Japan is known as one of the largest distributor of LPIC in the world(according to LPI-Japan they have distributed 164k LPIC so far).

http://www.oss-db.jp/news/press/20110608_04.shtml

According to LPI-Japan, OSS-DB will be ready for several open source databases in the future. However the initial version will only support PostgreSQL(!)

Representatives from Fujitsu, Hitach, Miracle Linux, NEC, NEC Soft., NTT, and SRA OSS attended the press announcement event.

I hope OSS-DB will significantly contribute to making PostgreSQL more popular in Japan.

Tuesday, March 15, 2011

The 2011 off the Pacific coast of Tohoku Earthquake

Thanks to everyone who sent me emails regarding the 2011 off the Pacific coast of Tohoku Earthquake in Japan.

Fortunately I, my family and employees are all fine. My home town is in Tohoku area and my mother, brother and sister, who are living there, are still sufferings from lack of water, gasoline and gas etc. even though they did not have the tsunami.

Needless to say about those who had the tsunami. According to the goverment it is likely more than 10,000 people died. I would like to pray for the repose of their soul.

We also worry about the Fukushima nuclear powerplant which has serious problems caused by the tsunami and the earthquake. This also heavily affects the electricity in other area including Tokyo. Due to lack of electiricity we are forced to have a power cut. Trains/metros in Tokyo are revolving only 50% comparing before.

The most important thing we could do here is keeping on our businesses to make money and resources to help Tohoku and Japan. I'm sure we can rehabilitate Japan.

Again I would like to thank to everyone who care about us.

Sunday, March 6, 2011

Visiting Brussels

After visiting Paris, I moved to Brussels to join FOSDEM, the largest open source conference in Europe.

I gave a 45-minute talk abut PostgreSQL and pgpool-II.

After finishing the talk we enjoyed a small tour of the beautifull city.

This is the famous "Grand Place".

There are nice Art Nouveau caffes in the city.

I bought some laces in a old lace shop. They are hand-made and very nice.

There is a great chuch which has a big pipe organ. I was so impressed and bought a CD, recorded the acuall play of the organ!

Sunday, February 13, 2011

Visiting Paris

On February 3rd 2011, I gave my first ta lk in Paris.
The conference was called "PostgreSQL session", sponsored and organized by a French PostgreSQL company Dalibo. Dalibo started to give a series of PostgreSQL seminar in Paris and the conference I joined was the first one.

Damien Clochard of Dalibo was the organizer and he did an excellent job. Especially the room used for the conference was pretty impressive for me. At a first it looked very old, but actually not, according to Jean-Paul Argudo of Dalibo. The room was designed by an artist so that it looked very old.

The first picture is David Fetter, talking about dblink.

The second picture is Philippe Beaudoin of Bull. He talked about an interesting migration case in French government.

The lunch was served as described in the menu left. The meals were so delicious, I forgot to take pictures of them:-)

I would like to thank to all the people who organized and joined this great conference!

After finishing the session, I enjoyed a small sightseeing of course.

In summary, the conference was great, meal was great and my trip was great. I hope to visit Paris again!

Tuesday, December 28, 2010

Playing with PHP PDO

PHP PDO(PHP Data Object) allows you to access database including PostgreSQL within PHP scripts. Good thing with PDO is, you do not need to heavily modify your PHP script even if you change the database product you want to use. This resembles to JDBC driver or Perl DBI. Moreover you could use database product dependent interface. Which means you can enjoy the full power of PostgreSQL with sacrificing portability of course. There's no free lunch. In summary, PHP PDO is great and I would like to recommend to anyone who is thinking about serious database programming in PHP.

One day I came across a complaint from pgpool-II user. According to him, his PHP client was blocked by pgpool-II while trying to connect to it. Ok, this is a typical situation when user misconfigured num_init_children and I thought he tried to connect pgpool-II concurrently with more than num_initi_children sessions. But his answer is, no. I suspected the PDO internal but I was too lazy to look into PDO source code. Instead I use strace to see what PDO is doing. To make the long story short, PDO releases the connection to pgpool-II long after "$dbh = null;"($dbh is the "handle" of the connection. This should release the connection). So if his script continues and connects to pgoool-II again after he thinks that he releases the connection, the script actually has two concurrent connections to pgpool-II. Of course this will create more connections than he thought.

I think the reason why many people do not come up with similar complaint is, their script usually create just one connection and release it at the end of the script.

Thursday, November 4, 2010

Tutorial to pgpool-II plus Streaming replication

I have written a tutorial to pgpool-II using PostgreSQL 9.0's streaming replication. The tutorial is designed to use minimum resource: a Linux box. On the box, pgpool-II, pgpoolAdmin(a PHP based pgpool-II GUI admin tool) and two PostgreSQL 9.0 instances.

Though the tutorial system is surprisingly simple, you could learn:

- How pgpool-II detects PostgreSQL failure and automatically promote a standby server to primary server

- How pgpool-II re-sync the failed node by using "online recovery" capability

Please enjoy!