Tuesday, December 28, 2010

Playing with PHP PDO

PHP PDO(PHP Data Object) allows you to access database including PostgreSQL within PHP scripts. Good thing with PDO is, you do not need to heavily modify your PHP script even if you change the database product you want to use. This resembles to JDBC driver or Perl DBI. Moreover you could use database product dependent interface. Which means you can enjoy the full power of PostgreSQL with sacrificing portability of course. There's no free lunch. In summary, PHP PDO is great and I would like to recommend to anyone who is thinking about serious database programming in PHP.

One day I came across a complaint from pgpool-II user. According to him, his PHP client was blocked by pgpool-II while trying to connect to it. Ok, this is a typical situation when user misconfigured num_init_children and I thought he tried to connect pgpool-II concurrently with more than num_initi_children sessions. But his answer is, no. I suspected the PDO internal but I was too lazy to look into PDO source code. Instead I use strace to see what PDO is doing. To make the long story short, PDO releases the connection to pgpool-II long after "$dbh = null;"($dbh is the "handle" of the connection. This should release the connection). So if his script continues and connects to pgoool-II again after he thinks that he releases the connection, the script actually has two concurrent connections to pgpool-II. Of course this will create more connections than he thought.

I think the reason why many people do not come up with similar complaint is, their script usually create just one connection and release it at the end of the script.

Thursday, November 4, 2010

Tutorial to pgpool-II plus Streaming replication

I have written a tutorial to pgpool-II using PostgreSQL 9.0's streaming replication. The tutorial is designed to use minimum resource: a Linux box. On the box, pgpool-II, pgpoolAdmin(a PHP based pgpool-II GUI admin tool) and two PostgreSQL 9.0 instances.

Though the tutorial system is surprisingly simple, you could learn:

- How pgpool-II detects PostgreSQL failure and automatically promote a standby server to primary server

- How pgpool-II re-sync the failed node by using "online recovery" capability

Please enjoy!

Sunday, September 12, 2010

pgpool-II 3.0 officially released

After nearly one month testing, we finally sent out pgpool-II 3.0 to the door on September 10, 2010. On that day SRA OSS, Inc. celebrated its 5th Anniversary. We had several talks there including Masao Fujii, the author of Streaming replication of PostgreSQL 9.0. Another talk was given by Toshihiro Kitagwa, the major developer of pgpool-II, explaining how Streaming replication/Hot standby can be work with pgpool-II.
About one hundred people joined the session. I was very happy to report "pgpool-II is out today" to them.

The details of pgpool-II 3.0 can be found at: http://lists.pgfoundry.org/pipermail/pgpool-general/2010-September/002946.html

From now on I would like to explain new features of pgpool-II 3.0, probably one feature per one blog entry.

Today's topic is the "white function list" and "black function list".

Background: pgpool-II needs to know the usage of functions in SELECT statements which will write to database. If such SELECT is given, pgpool-II should not do load balance. Instead it sends the SELECT to al DB nodes(in replication mode) or sends to the master node (in master/slave mode).

Typical queries are:

SELECT nextval('foo_sequence');
SELECT setval('foo_sequence', 100);

In previous version pgpool-II found nextval and setval in hard coded way. Problem is, user might want to create their own functions which does write. The work around was to use SQL comment. For example:

/*NO LOAD BALANCE*/ SELECT my_writing_function();

This is not only unpolished but forcing users to rewrite SQL statements in existing applications. In pgpool-II 3.0, you can use black function list in such cases:

black_function_list = 'nextval,setval,my_writing_function'

White_function_list works in exactly opposite way. Any functions not listed in the white_function_list are regarded as being write to database.

Pgpool-II 3.0 also enhances the way to search functions in SELECT. Before it only searchs in the top level of SELECT. That is, if those functions are used in subquery or some such, it cannot detect the existence of the functions. Pgpool-II 3.0 does a recursive search against he parse tree and can find those functions anywhere.

Wednesday, August 18, 2010

Pgpool-II 3.0/pgpoolAdmin 3.0 beta1 released

After 3 months of hard work, pgpool-II 3.0/pgpoolAdmin 3.0 beta1 is finally out.
(details)

Please join the beta test program. Comments and suggestions are welcome!

Many people helped us in making this reality. Especially I would like to thank to:

Guillaume Lelarge: enhanced pcp commands, lots of patches, French translation and more...
Masao Fujii: gave us valuable info about streaming replication of PostgreSQL 9.0
Simon Riggs: gave us valuable info about Hot standby of PostgreSQL 9.0
Toshihiro Kitagawa: did hard work on enhancing extended protocol processing
Nozomi Anzai: updated/enhanced internal of pgpoolAdmin

Sunday, August 15, 2010

PostgreSQL 9.0 and pgpool-II endurance test

I have started to run an endurance test of PostgreSQL 9.0beta4 and pgpool-II 3.0-dev on Aug 12 17:47.
The hardware: two dual-core opteron 2.2GHz, 2GB mem.
One is used for the streaming replication primary server and the other is used for the standby server.
The load is provided by pgbench -N -c 10 (10 concurent users) -C (each time the transaction starts, pgbench connects to pgpool), which runs on the primary server.

Up to now the test has lasted for about 88 hours. No errors, no OOM killer run because of memory leak. Totally 291 million transactions have been processed.

Here is some TPS data. "excluding" means excluding connect/disconnect time. "including" means including connect/disconnect time. So it's not supprising that "excluding" is far better than "including". This is the reson why you want connection pooling.

select avg(excluding) as excluding_avg,
stddev(excluding) as excluding_stddev,
avg(including) as including_avg,
stddev(including) as including_stddev
from endurance_test;

-[ RECORD 1 ]----+-----------------
excluding_avg | 1290.55333121045
excluding_stddev | 18.6107550266789
including_avg | 916.442996303828
including_stddev | 10.3926488799762

Tuesday, June 15, 2010

pgpool-II and Hot Standby

PostgreSQL 9.0 will have a great feature called "Hot Standby". Unlike warm standby, which is just waiting for in case of master server goes down, it gives an access to the standby server for lots of useful purposes such as complex reporting query. However only read queries and other several commands are allowed to be sent to the standby server. According to the PostgreSQL 9.0 beta2 document, this is the list of allowed queries:
  • Query access - SELECT, COPY TO including views and SELECT rules
  • Cursor commands - DECLARE, FETCH, CLOSE
  • Parameters - SHOW, SET, RESET
  • Transaction management commands
  • BEGIN, END, ABORT, START TRANSACTION
  • SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
  • EXCEPTION blocks and other internal subtransactions
  • LOCK TABLE, though only when explicitly in one of these modes: ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
  • Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
  • Plugins and extensions - LOAD
No problem for PostgreSQL experts of course. But I doubt it is super easy for ordinary PostgreSQL users to know which query is allowed and which is not just looking at the list.

pgpool-II 3.0 (yes, that will be the new version number) will give you small help. Pgpool parses the query and automatically redirect it to the standby server if needed. If you have more than one standby servers, pgpool will choose one of them and send the query to it. The queries not allowed for the standby server, will be sent to the primary server. In short, you do not need to worry about the complex rule above if you use pgpool-II 3.0 + Hot Standby.

Please note that there is a limitation here. Pgpool parses the query by using "raw parser" which understands the syntax of the query but does not understand the "depth psychology" aspect of the query. For example, if a SELECT calls a custom C function which modifies the database, pgpool will send it to the standby server because it does not know that the SELECT actually does write.

It will be great that PostgeSQL's function does have a signature that confess it is going to write...

Thursday, June 10, 2010

Talk with the author of Streaming Replication

Yesterday I talked with Masao Fujii, the author of Streaming Replication. Streaming Replication(SR) is a long awaited feature, "built in replication" of PostgreSQL. The great feature will come with PostgreSQL 9.0, which is supposed to be released this Summer. With SR, pgpool-II will provide two important features: load balancer and failover control. Since SR is a master/slave type replication system, the standby server (slave) do not accept write queries such as INSERT, DELETE, UPDATE, CREATE TABLE and so on. Any write query sent to pgpool will be redirected to the primary server (master) and users do not need to worry about it. On the other hand, as SELECT can be executed both on the primary and the standby, pgpool will choose one of them and send the SELECT to it. This technique is known as "load balancing" and this will significantly improve the over all performance.

One thing we were worried about was, what if the connection between the primary server and the standby server goes down? There's no easy way for pgpool to know it. If pgpool does not know that, SELECT can be sent to the standby whose data is far behind the primary. Apparently this is what users want to avoid. To overcome the problem, pgpool checks the transaction log number (LSN) on both the primary and the standby. If standby's LSN is too small comparing with the one on the primary, we might have a problem. So pgpool stops the load balancing and send SELECT only to the primary. How behind is "behind" should be defined by new directive in pgpool.conf. Good thing with this is, the technique will also detects the delay because of the heavy load of the standby.

We will continue to discuss how to improve the usability of SR and pgpool-II.

Monday, June 7, 2010

A mail client using PostgreSQL

Do you know sylpheed? It's an open source mail client and has been
used by many people, who are using more than 30 different languages,
all over the world for ten years now.

Sylpheed was born in Japan and has been developed by Mr. Yamamoto, who
has been working for our company since 2006. Recently he has made a
commercial version of Sylpheed called "Sylpheed Pro".

The most interesting feature with Sylpheed Pro is, it comes with
PostgreSQL's nice full text search engine. If you install Sylpheed
Pro, you have PostgreSQL running on your computer. Any mail received
by Sylpheed Pro is automatically stored into PostgreSQL database so
that it can be used later to execute full text search on your
mails. How fast is it? Usually you can find the mail what you are
looking for within 1 second out of 10,000 mails. Sylpheed Pro even
can take care of attached documents including PDF, Word, Excel,
OpenOffice files. It extracts the text part from those files.

At present, Sylpheed Pro is sold in Japan only. However you could
download a trial version of Sylpheed Pro and test it since Sylpheed
Pro is a multi-lingual software. So if you do not care that the
download page is written in Japanese, please try it from here.

Hint:

There are six text boxes you need to fill in.

[Last name] [Fist name]
[Last name] [Fist name](please repeat)
[email address]
[company name]
[you can leave it blank]

then click *left* button (right button is "clear"!)

Saturday, June 5, 2010

Refactoring pgpool-II

Pgpool was born in 2003. When it was reborn as pgpool-II in 2007, we had made a big change of its architecture. For example, we gave a capability to pgpool to use more than 2 DB nodes.

In 2010, we are refactoring pgpool-II again: disentangling the spaghetti code(there are tones of global variables, which is a bad sign), moving some codes from a file to more appropriate one and so on. I believe the work will make pgpool-II more solid, easier to understand by new developers.

Of course we are not just refactoring pgpool-II. We will add nice features including adaption for Streaming replication/Hot Standby, which is one of the greatest features of upcoming PostgreSQL 9.0.

We hope the work will be finished by August.

Stay tuned.

Dynamic spare process management in Pgpool-II

Pre-fork architecture in Pgpool-II Pgpool-II uses fixed number of pre-forked child process which is responsible for accepting and handling e...