Tuesday, June 15, 2010

pgpool-II and Hot Standby

PostgreSQL 9.0 will have a great feature called "Hot Standby". Unlike warm standby, which is just waiting for in case of master server goes down, it gives an access to the standby server for lots of useful purposes such as complex reporting query. However only read queries and other several commands are allowed to be sent to the standby server. According to the PostgreSQL 9.0 beta2 document, this is the list of allowed queries:
  • Query access - SELECT, COPY TO including views and SELECT rules
  • Cursor commands - DECLARE, FETCH, CLOSE
  • Parameters - SHOW, SET, RESET
  • Transaction management commands
  • BEGIN, END, ABORT, START TRANSACTION
  • SAVEPOINT, RELEASE, ROLLBACK TO SAVEPOINT
  • EXCEPTION blocks and other internal subtransactions
  • LOCK TABLE, though only when explicitly in one of these modes: ACCESS SHARE, ROW SHARE or ROW EXCLUSIVE.
  • Plans and resources - PREPARE, EXECUTE, DEALLOCATE, DISCARD
  • Plugins and extensions - LOAD
No problem for PostgreSQL experts of course. But I doubt it is super easy for ordinary PostgreSQL users to know which query is allowed and which is not just looking at the list.

pgpool-II 3.0 (yes, that will be the new version number) will give you small help. Pgpool parses the query and automatically redirect it to the standby server if needed. If you have more than one standby servers, pgpool will choose one of them and send the query to it. The queries not allowed for the standby server, will be sent to the primary server. In short, you do not need to worry about the complex rule above if you use pgpool-II 3.0 + Hot Standby.

Please note that there is a limitation here. Pgpool parses the query by using "raw parser" which understands the syntax of the query but does not understand the "depth psychology" aspect of the query. For example, if a SELECT calls a custom C function which modifies the database, pgpool will send it to the standby server because it does not know that the SELECT actually does write.

It will be great that PostgeSQL's function does have a signature that confess it is going to write...

Thursday, June 10, 2010

Talk with the author of Streaming Replication

Yesterday I talked with Masao Fujii, the author of Streaming Replication. Streaming Replication(SR) is a long awaited feature, "built in replication" of PostgreSQL. The great feature will come with PostgreSQL 9.0, which is supposed to be released this Summer. With SR, pgpool-II will provide two important features: load balancer and failover control. Since SR is a master/slave type replication system, the standby server (slave) do not accept write queries such as INSERT, DELETE, UPDATE, CREATE TABLE and so on. Any write query sent to pgpool will be redirected to the primary server (master) and users do not need to worry about it. On the other hand, as SELECT can be executed both on the primary and the standby, pgpool will choose one of them and send the SELECT to it. This technique is known as "load balancing" and this will significantly improve the over all performance.

One thing we were worried about was, what if the connection between the primary server and the standby server goes down? There's no easy way for pgpool to know it. If pgpool does not know that, SELECT can be sent to the standby whose data is far behind the primary. Apparently this is what users want to avoid. To overcome the problem, pgpool checks the transaction log number (LSN) on both the primary and the standby. If standby's LSN is too small comparing with the one on the primary, we might have a problem. So pgpool stops the load balancing and send SELECT only to the primary. How behind is "behind" should be defined by new directive in pgpool.conf. Good thing with this is, the technique will also detects the delay because of the heavy load of the standby.

We will continue to discuss how to improve the usability of SR and pgpool-II.

Monday, June 7, 2010

A mail client using PostgreSQL

Do you know sylpheed? It's an open source mail client and has been
used by many people, who are using more than 30 different languages,
all over the world for ten years now.

Sylpheed was born in Japan and has been developed by Mr. Yamamoto, who
has been working for our company since 2006. Recently he has made a
commercial version of Sylpheed called "Sylpheed Pro".

The most interesting feature with Sylpheed Pro is, it comes with
PostgreSQL's nice full text search engine. If you install Sylpheed
Pro, you have PostgreSQL running on your computer. Any mail received
by Sylpheed Pro is automatically stored into PostgreSQL database so
that it can be used later to execute full text search on your
mails. How fast is it? Usually you can find the mail what you are
looking for within 1 second out of 10,000 mails. Sylpheed Pro even
can take care of attached documents including PDF, Word, Excel,
OpenOffice files. It extracts the text part from those files.

At present, Sylpheed Pro is sold in Japan only. However you could
download a trial version of Sylpheed Pro and test it since Sylpheed
Pro is a multi-lingual software. So if you do not care that the
download page is written in Japanese, please try it from here.

Hint:

There are six text boxes you need to fill in.

[Last name] [Fist name]
[Last name] [Fist name](please repeat)
[email address]
[company name]
[you can leave it blank]

then click *left* button (right button is "clear"!)

Saturday, June 5, 2010

Refactoring pgpool-II

Pgpool was born in 2003. When it was reborn as pgpool-II in 2007, we had made a big change of its architecture. For example, we gave a capability to pgpool to use more than 2 DB nodes.

In 2010, we are refactoring pgpool-II again: disentangling the spaghetti code(there are tones of global variables, which is a bad sign), moving some codes from a file to more appropriate one and so on. I believe the work will make pgpool-II more solid, easier to understand by new developers.

Of course we are not just refactoring pgpool-II. We will add nice features including adaption for Streaming replication/Hot Standby, which is one of the greatest features of upcoming PostgreSQL 9.0.

We hope the work will be finished by August.

Stay tuned.

Dynamic spare process management in Pgpool-II

Pre-fork architecture in Pgpool-II Pgpool-II uses fixed number of pre-forked child process which is responsible for accepting and handling e...