Friday, July 19, 2024

Row pattern recognition feature for PostgreSQL

What is row pattern recognition feature?

Row pattern recognition (RPR) is a feature defined in the SQL standard. It allows to search for a sequence of rows by pattern.  Since I am working on this feature for PostgreSQL, I would like to give a brief introduction to RPR.
 
Consider a table holding date and daily stock price of a company. 
 company  |   tdate    | price
----------+------------+-------
 company1 | 2024-07-01 |   100
 company1 | 2024-07-02 |   200
 company1 | 2024-07-03 |   150
 company1 | 2024-07-04 |   140
 company1 | 2024-07-05 |   150
 company1 | 2024-07-06 |    90
 company1 | 2024-07-07 |   110
 company1 | 2024-07-08 |   130
 company1 | 2024-07-09 |   120
 company1 | 2024-07-10 |   130
(10 rows)
Suppose you want to find a sequence of rows in which the stock price rises once or more then falls once or more. For example, the stock price on July 1 is 100, then rises to 200 on July 2 and then falls to 150 and 140 on July 2 to July 3. RPR allows users to write this kind of queries in intuitive way.
 

RPR syntax

 To express the query in RPR, you first define row pattern  variables.
 
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)

Here DEFINE is a keyword to start the definition. START, UP and DOWN are row pattern variable names. The right hand side of AS is logical expression which the pattern variable needs to satisfy. The logical expression "TRUE" means always true. PREV() is a special function that only used in RPR (cannot use elsewhere, for example SELECT target or WHERE clause).  PREV takes a column name as an argument, and returns the previous row's column value. So if today's price is higher than yesterday's price, "price > PREV(price)" returns TRUE. DOWN has opposite meaning.

Once the row pattern variables are defined, you can define patterns to search for by using the row pattern variables.

PATTERN (START, UP+, DOWN+)

Here PATTERN is a keyword to start the definition. START matches any row and this is the starting point of the pattern matching. If the next row satisfies UP, then more rows are examined if they satisfy the logical expression UP because '+' is added to UP. This is a similar concept of regular expressions. If any row does not satisfy UP, then DOWN is tested against the row. If it satisfies, more rows are examined like UP. If no row matches DOWN, then the search finishes. 

The RPR syntax

The explanation above is just a part of RPR syntax. Now let's see the whole RPR syntax.
 
 WINDOW window_name AS (
[ PARTITION BY ... ]
[ ORDER BY... ]
[ MEASURES ... ]
ROWS BETWEEN CURRENT ROW AND ...
[ AFTER MATCH SKIP ... ]
[ INITIAL|SEEK ]
PATTERN (...)
[ SUBSET ... ]
DEFINE ...
)

As may notice, this is an expansion to WINDOW clause. Actually there are two types of RPR defined in the SQL standard: R010 and R020. R020 is the syntax above. Since I am working on implementing R020 in PostgreSQL,  I focus on R202 in this article. Also current implementation does not support MEASURE and SUBSET, I will not explain these in the rest of this article. If you are interested in R010 or MEASURE, SUBSET, please look into the SQL standard.
 

Let's run RPR

If I implement the example above in the R020 syntax, the whole query and the result will look like this:

SELECT company, tdate, price,
 count(*) OVER w
 FROM stock
 WINDOW w AS (
 PARTITION BY company
 ORDER BY tdate
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 AFTER MATCH SKIP PAST LAST ROW
 INITIAL
 PATTERN (START UP+ DOWN+)
 DEFINE
  START AS TRUE,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);
 
 company  |   tdate    | price | count
----------+------------+-------+-------
 company1 | 2024-07-01 |   100 |     4
 company1 | 2024-07-02 |   200 |     0
 company1 | 2024-07-03 |   150 |     0
 company1 | 2024-07-04 |   140 |     0

 company1 | 2024-07-05 |   150 |     0
 company1 | 2024-07-06 |    90 |     4
 company1 | 2024-07-07 |   110 |     0
 company1 | 2024-07-08 |   130 |     0
 company1 | 2024-07-09 |   120 |     0

 company1 | 2024-07-10 |   130 |     0
(10 rows)


As we already see the stock price on July 1 is 100, then rises to 200 on July 2 and then falls to 150 and 140 on July 2 to July 3. The count column shows that the query found 4 matching rows. The reason why the count column for July 2 to 4 are 0 is they are skipped and are not evaluated. This behavior comes from "AFTER MATCH SKIP PAST LAST ROW" clause: as 4 rows are found to be matched, we skip 4 rows and restart next pattern matching from July 5. Since stock price on July 6 is 90, the pattern matching fails. Thus the count column is 0.
 
And next pattern match starts at July 6. Now we find 4 matching rows.
 
Next pattern matching  starts at July 10. The pattern match needs at least 3 rows. However there's only 1 ow remains in the partition, patter matching fails.
 

Variation of pattern

 The pattern matching can be easily extended. For example you can add more constraint to the pattern like "not only the stock price rises but the range is more than 50". In this case you can change the DEFINE clause as follows:
 
 DEFINE
  START AS TRUE,
  UP AS price > PREV(price) + 50,
  DOWN AS price < PREV(price)
 
 With this change we get this:

 company  |   tdate    | price | count
----------+------------+-------+-------
 company1 | 2024-07-01 |   100 |     4
 company1 | 2024-07-02 |   200 |     0
 company1 | 2024-07-03 |   150 |     0
 company1 | 2024-07-04 |   140 |     0

 company1 | 2024-07-05 |   150 |     0
 company1 | 2024-07-06 |    90 |     0
 company1 | 2024-07-07 |   110 |     0
 company1 | 2024-07-08 |   130 |     0
 company1 | 2024-07-09 |   120 |     0
 company1 | 2024-07-10 |   130 |     0
(10 rows)

Please note that July 6 row does not match any more.

What's in the current implementation?

I have been working on this feature and publishing patches on pgsql-hackers mailing list.  The whole discussion can be found at [1]. The latest patch can be found near the end of the thread. At this point the supported features in the patch are explained in [2]. If you are interested more examples, probably the regression test result in the patch is what you want to check (src/test/regress/expected/rpr.out).

How to test the patch?

Since the patch is against PostgreSQL git repository master branch, you need to clone the git repository first.

git clone https://git.postgresql.org/git/postgresql.git
 
Save the patch somewhere. Suppose you save in /tmp/rpr. 

git apply /tmp/rpr/*.patch

Then build PostgreSQL as usual. Since the patch changes some system catalogue, you need to create a fresh database cluster by using initdb.

If you find something with the patch or have comments/suggestions, please post to [1].

Also you can watch a talk I gave at PGConf.dev 2024 [3] and the slide[4].

 
 
 



Wednesday, November 16, 2022

Dynamic spare process management in Pgpool-II

Pre-fork architecture in Pgpool-II

Pgpool-II uses fixed number of pre-forked child process which is responsible for accepting and handling each client connection to Pgpool-II. When a client tries to connect to Pgpool-II, one of pre-forked child process (the number of child process is defined by num_init_children parameter) accepts the request and takes care of the connection.

One of the advantages of this method is avoiding the overhead of spawning process each time when a new connection is requested. However, when the number of pre-forked process is large like over 1,000 and concurrent connections to Pgpool-II is relatively small, most the pre-forked process just wakes up when new connection arrives and wastefully consumes CPU resource.

Dynamic spare process management in Pgpool-II 4.4.

To overcome the issue, "Dynamic spare process management" has been implemented for upcoming Pgpool-II 4.4 (Thanks to Jianshen Zho and Muhammad Usama for implementing this feature). Following new parameters are introduced:

  • process_management_mode
  • process_management_strategy
  • min_spare_children
  • max_spare_children

 If process_management_mode is 'dynamic', the new feature is enabled. If it is set to 'static', the process management is compatible with pre-4.4 style. The default is 'static'.

With process_management_mode is 'dynamic', Pgpool-II dynamically increases or decreases the number of child process according to the demand. The demand is judged by counting number of 'spare' child process. Here, 'spare' means process state which is just waiting for connections from clients. The idea is, dynamically adjusting the number of child process according to the number of spare process.

process_management_strategy can control the aggressiveness of decreasing the number of spare children by choosing one of 'lazy', 'gentle' and 'aggressive'. If 'lazy', it takes 5 minutes before decreasing the spare process. If 'gentle', it takes 2 minutes. Finally if 'aggressive', 50 seconds. The default is 'gentle'. 

min_spare_children and max_spare_children controls when pgpool increases the number of spare process. If spare process is less than min_spare_process, pgpool will spawn new process until reaching min_spare_children + 25% of max_spare_children. The defaults for min_spare_children and max_spare_children are 5 and 10 respectively.

The benchmark

Ok, so how the feature improves performance?

I did a benchmarking using standard pgbench on a Laptop machine running Ubuntu 20 with 16GB memory and an SSD storage. I used "-C" option of pgbench because the feature should improve the performance while clients connecting to Pgpool-II. If the option is not used, pgbench keeps the connection to Pgpool-II which does not reveal the effect of the feature. The transaction type was read only SELECT (-S) and the number of concurrent connections is 10, which should be small enough comparing with num_init_children = 900. max_connections parameter was 1,000 for each PostgreSQL 15 (I created the environment using pgpool_setup tool to create a 2-node streaming replication cluster). I just changed the process_management_mode parameter to 'dynamic' and 'static' but other parameters were left to the default. I ran pgbench 3 times each with duration 30 seconds and took the median number.

The result

The result was dramatic: I see nearly 5 times speed up with process_management_mode parameter is 'dynamic' comparing with the case when it was 'static'.

As you can see, when the process_management_mode is 'dynamic', the TPS (transactions per second) is nearly 5 times larger than the case when it's 'static'.

Conclusion

New feature "dynamic spare process management" improves the connection performance when concurrent connections to Pgpool-II is small comparing with num_init_children. We had to set num_init_children to up to the highest demand of connection request but this sacrifices performance when the concurrent connection demand is low. With the new feature, we don't need to make a compromise.

 

 

Monday, September 26, 2022

Query cache improvement in Pgpool-II 4.4

 Query cache in Pgpool-II

Pgpool-II has built-in in memory query result cache which may dramatically enhance performance in certain use cases. If your workload is mostly read only SELECT and certain query is frequently used, then you might be able to enjoy the feature. When a SELECT comes to Pgpool-II, it saves the result of the query in shared memory or memcached (you can configure which one to be used). If same query, meaning the query string is  absolutely identical to  the previous query,  then Pgpool-II does not send the query to PostgreSQL. Instead, it fetches the query result from the shared memory (or memcached) and returns to client. Note that in order to this, the user and database must be also identical. In the process following tasks to generate query result are eliminated thus takes less time comparing with standard query processing:

  • Parsing query
  • Generating query plan
  • Executing the query plan
  • Network round-trip between Pgpool-II and PostgreSQL

Note , however, when a table is modified or dropped, the query cache corresponding to the table is removed. If this frequently happens, query cache may be slower than ordinal query processing because of its overhead. You can monitor the query cache hit ratio by using "SHOW pool_cache" command to make a decision  whether you should use the query cache feature or not.

Bottle neck in query cache

The query cache has one weakness until Pgpool-II 4.3. As you can easily guess, the shared memory is heavily accessed by concurrent users. In order to avoid the confusion, Pgpool-II uses strong lock which is called "exclusive" lock. Once a Pgpool-II process acquires the lock, other process have to wait until the process release the lock. As a result when there are many concurrent users, the query cache performance rapidly drops.

Pgpool-II 4.4 will mitigate the problem

Upcoming Pgpool-II 4.4, which is expected to be released by the end of this year, will use "shared lock" in certain cases.

There are two types of query cache access: one is just fetching the query cache, the other is modifying existing query cache data. The latter needs the exclusive lock anyway. But the former just needs shared lock which guarantees the contents of the query cache data is not changed while the lock is held. Since shared lock holders can run concurrently, the performance will be enhanced if most clients just read the query cache data.

The result

Below is a graph showing how the benchmark results are compared. The result was generated by using PostgreSQL's standard benchmark tool "pgbench". The transaction type is read only ("-S"). The database size is default (100k rows). Each pgbench run  is 10 seconds.  The hardware used was an Ubuntu 20  laptop with 16GB mem and 6 physical CPU cores. two PostgreSQL 15 master head are used and are configured as a two-node streaming replication cluster.


 The red line is the master branch head, which will be released as Pgpool-II 4.4. The blue line is Pgpool-II 4.3. The yellow line is the case in 4.3 when query cache is disabled just for a reference.

Until 4 concurrent users there's no big difference between 4.4 and 4.3. But as the number of concurrent users grows, the performance of 4.3 rapidly decreases from 160k to 100k tps at 32 concurrent clients. While 4.4 keeps 140k tps at 32 concurrent users. The difference is about 140%.

Conclusion

 The query cache is great for mostly read only workloads. But it had weakness when there are lots of concurrent clients. Upcoming Pgpool-II  4.4 will mitigate the weakness by using shared lock. A benchmark result shows the enhancement is up to 40% at 32 concurrent clients.


Wednesday, May 11, 2022

PostgreSQL supports COPY FREEZE in pgbench but it will not appear in the release note

PostgreSQL 15 will include COPY FREEZE support

 In my blog post I reported that a patch to allow to use COPY FREEZE in pgbench was accepted for upcoming PostgreSQL 15  (supposed to be released by the end of 2022). It seems the patch has survived without being reverted!

 But it will not be in the release note

Now that PostgreSQL 15 beta1 is coming soon, I checked the proposed release note for PostgreSQL 15 beta1. Unfortunately it does not mention anything about COPY FREEZE support in pgbench.  Don't worry. The code for supporting COPY FREEZE in pgbench is surely in PostgreSQL 15. Please try PostgreSQL 15 beta1 once it is out!

 


Friday, February 11, 2022

What's new in Pgpool-II 4.3 (part3)

 


 In the previous blogs (part1 and part2) I have explained following new features of Pgpool-II 4.3.

  • Configuration file format has been changed for easier configuration.
  • More information is added to utility commands to make admin's life easier.
  • Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes.

I will explain rest of the feature in this blog.

 Allow to choose PostgreSQL node for promotion

Before 4.3 it was difficult to promote specified standby PostgreSQL node. Surely we have pcp_promote_node command but it actually just changes the status in Pgpool-II: it does not actually promotes PostgreSQL. Now pcp_promote_node has new option --switchover which actually promote PostgreSQL standby node. Details has been already written in a blog entry. Please visit the blog entry for more details.

Allow to choose the least replication delay node in load balancing

Pgpool-II  can distribute read query among PostgreSQL servers. Also Pgpool-II can avoid to send read queries to standby servers if the replication delay between primary PostgreSQL server and standby PostgreSQL servers by using delay_threshold parameter.

New parameter prefer_lower_delay_standby does even more intelligent read query load balancing. This parameter is only useful if you have two or more standby servers. If the parameter is enabled and delay_threshold of the load balance node (that is the node for load balancing when the session started) is greater than 0, Pgpool-II will choose the least delay standby node to route read queries if replication delay exceeds delay_threshold. This will bring more chance to distribute load to standby servers while minimize the replication delay.

Catch up the latest SQL parser of PostgreSQL 14 as usual

Every Pgpool-II major releases import the latest PostgreSQL SQL parser at that point. When Pgpool-II 4.3 was out, PostgreSQL 14 has already been released. So Pgpool-II imports the SQL parser of PostgreSQL 14. With this user can enjoy the latest SQL features include:

  • Allow an alias to be used to a JOIN ... USING
  • Allow DISTINCT to be added to GROUP BY
  • New SEARCH and CYCLE clauses for common table expression

Other new features

  • Allow to use  time stamp with milliseconds.
  •  Add new config parameter failover_on_backend_shutdown. See the past blog entry for more details.
  • Add a new parameter which represents the recovery source hostname to recovery_1st_stage_command and recovery_2nd_stage_command. Why you need this is explain in the manual: 
"Before the hostname of the main (primary) node was obtained by using hostname command. This is mostly ok since the script runs on the main (primary) node anyway. However in some systems the hostname obtained by hostname command is different from the hostname defined in backend_hostname configuration parameter. This could cause a trouble in detach_false_primary because it checks connectivity between primary and standby node by using host parameter in primary_conninfo parameter, which is generated by recovery_1st_stage_command. Thus it is strongly recommended to use this parameter instead of using hostname command to obtain the hostname of the primary node in recovery_1st_stage_command."

Conclusion

This is the last blog entry in the series of blog entries focusing on Pgpool-II 4.3. While Pgpool-II 4.3  has more features, it is easier to configure and use than previous releases. Please try it out!

Sunday, February 6, 2022

What's new in Pgpool-II 4.3 ? (part2)


 In the last blog entry I explained what's new in Pgpool-II 4.3. This is the part 2.

Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes

Watchdog is a feature to prevent single point of failure (SPOF) problem of Pgpool-II itself. Since Pgpool-II runs  as a proxy between clients and PostgreSQL servers, the failure of Pgpool-II makes entire database system unusable.

 


 Multiple Pgpool-II servers can be deployed to prevent SPOF so that when primary Pgpool-II server goes down, standby Pgpool-II server takes over the role, for example providing an access point as virtual IP address (VIP).

For this purpose each Pgpool-II server monitors each other whether peer Pgpool-II is alive or not. This means that if the communication between the Pgpool-II servers go down, each Pgpool-II thinks the peer Pgpool-II server goes down and only it's self is alive. This situation is called split brain. To avoid the split brain problem 3 or more odd number of Pgpool-II servers can be deployed.

In the figure above, Pgpool-II [A] is disconnected from [B] and [C], while [B] and [C] can communicate with each other. [A] thinks that [B] and [C] go down and "votes" to itself and gets 1 vote. [B] and [C] can vote to themselves and get 2 votes. To hold the majority vote, 2 or more is required. If someone hold the majority vote we say "quorum exists". From the [B] and [C]'s point of view, the quorum exists and the leader Pgpool-II is elected from [B] or [C] and provides the VIP. From [A]'s point of view, the quorum does not exist and VIP is released if it has. We avoid the split brain problem.

Ok, but what if [A] and [B] go down. Of course [C] thinks that the quorum does not exist  and VIP is released although there is one Pgpool-II server is running.

New concept "membership"

From Pgpool-II 4.3, new concept membership  is introduced.  This is to mitigate the issue above. Initially the number of members are equal to the number of Pgpool-II servers. If new parameter wd_remove_shutdown_nodes is enabled, the number of members is decreased when a Pgpool-II server is shutdown by administrator. For example, if [A] and [B] is shutdown by administrator, the number of members is 1 and the quorum is 1. So [C] is the leader of the cluster and VIP is assigned to [C].

 wd_lost_node_removal_timeout is another new parameter, which specifies the timeout in seconds before the LOST watchdog node is marked as non member and removed from the cluster. For example, if [A] and [B] is disconnected from [C], from C's point of view the number of members is 1, so the quorum is 1,  then C becomes the leader and holds VIP. But in this case there's possibility to have the split brain problem.  [A] and [B] might also think that one of them is should be the new leader because the quorum is from their point of view. Great care is needed when you enable wd_lost_node_removal_timeout parameter.

Next topic will be "Allow to choose PostgreSQL node for promotion"

I will continue next blog entry with another new feature of 4.3, allowing to choose PostgreSQL node for promotion.

 

Monday, January 31, 2022

What's new in Pgpool-II 4.3?

 
Photo by: pixundfertig

Release note and commit log of Pgpool-II 4.3

In this blog I would like to explain some of new features of the latest Pgpool-II major version 4.3, which was released in December 2021.

The official release note is here. Also, if you have the source code of Pgpool-II, you can find  the commit id for each items in the release note source code (written in SGML) so that you could look into the commit log for further understanding of the change.

 Executive summary of the changes 

Here is an executive summary of the changes in 4.3.

  • Configuration file format has been changed for easier configuration.
  • More information is added to utility commands to make admin's life easier.
  • Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes.
  • Allow to choose PostgreSQL node for promotion. 
  • Allow to choose the least replication delay node in load balancing.
  • Catch up the latest SQL parser of PostgreSQL 14 as usual.
  • Others.

 I will explain one by one.

Configuration file format has been changed for easier configuration

 Pgpool-II used to provide specific configuration file example for each clustering mode. For example, you need to start with pgpool.conf.sample-stream to configure the streaming replication mode. In 4.3, there's only one configuration example file. You just need to set "clustering_mode" parameter for the desired clustering mode to start configuration (of course you still need to add more information. For example, backend node information and assorted account information).

For this, all configuration parameters are commented out and the default values for each configuration parameters are carefully chosen so that users can leave them as they are (commented out). This is similar way to PostgreSQL's configuration file (postgresql.conf).

We advice users who plan to migrate to 4.3 from older version to inspect the default value changes. They are explained in the release note.

More information is added to utility commands to make admin's life easier

Some utility commands are enhanced so that more useful information is shown. 

"show pool_nodes" and "pcp_node_info" now show actual PostgreSQL status.

 test=# show pool_nodes;
 node_id | hostname | port  | status | pg_status | lb_weight |  role   | pg_role | select_cnt | load_balance_node | replication_delay | replication_state | replication_sync_state | last_status_change  
---------+----------+-------+--------+-----------+-----------+---------+---------+------------+-------------------+-------------------+-------------------+------------------------+---------------------
 0       | /tmp     | 11002 | up     | up        | 0.333333  | primary | primary | 0          | true              | 0                 |                   |                        | 2022-01-31 10:10:35
 1       | /tmp     | 11003 | up     | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2022-01-31 10:10:35
 2       | /tmp     | 11004 | down   | up        | 0.333333  | standby | standby | 0          | false             | 0                 | streaming         | async                  | 2022-01-31 10:18:47
(3 rows)

In the example, "pg_status" and "pg_role" are the new fields. You might notice that "status" field of node 2 is "up" while "pg_status" field is "up". This is because the node was detached by pcp_detach_node command. The command just changes the status recognized by Pgpool-II, but does not touch PostgreSQL backend on node 2. Before 4.3 admins have to additionally check PostgreSQL's status whether the backend node was actually down or not because the command just shows the node status as "down" in such cases.

These fields are added to pcp_node_info as well.


$ pcp_node_info -p 11001 -w -a -v
Hostname               : /tmp
Port                   : 11002
Status                 : 1
Weight                 : 0.333333
Status Name            : waiting
Backend Status Name    : up

Role                   : primary
Backend Role           : primary
Replication Delay      : 0
Replication State      : none
Replication Sync State : none
Last Status Change     : 2022-01-31 10:54:26

Hostname               : /tmp
Port                   : 11003
Status                 : 1
Weight                 : 0.333333
Status Name            : waiting
Backend Status Name    : up

Role                   : standby
Backend Role           : standby
Replication Delay      : 0
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2022-01-31 10:54:26

Hostname               : /tmp
Port                   : 11004
Status                 : 3
Weight                 : 0.333333
Status Name            : down
Backend Status Name    : up

Role                   : standby
Backend Role           : standby
Replication Delay      : 0
Replication State      : streaming
Replication Sync State : async
Last Status Change     : 2022-01-31 10:54:38

pcp_proc_info has some new fields or new information.

"Start time"  now has additional information like "(2:52 before process restarting)". This is only shown when "status" field is "Wait for connection", which means the pgpool process is completely idle and can accept new connection. In this case if child_life_time is non 0, the process will restart. The new information indicates how much time remains before the process restarts.

 "Backend connection time" is the timestamp when the connection to backend is created. "Client connection time" is the timestamp when the last client connected. "Client idle duration" is the duration in seconds while the client is idle or idle in transaction. Remaining duration before the client gets disconnected due to client_idle_limit (in seconds) . "Client disconnection time" is the timestamp when the last client disconnected to the process. If the client still connects to the process (thus "status" is "Idle" or "Idle in transaction"), the field will be empty.

$ pcp_proc_info -p 11001 --verbose
Database                  : test
Username                  : postgres
Start time                : 2021-09-28 04:16:00 (2:52 before process restarting)
Client connection count   : 1
Major                     : 3
Minor                     : 0
Backend connection time   : 2021-09-28 04:16:16
Client connection time    : 2021-09-28 04:16:16
Client idle duration      : 0
Client disconnection time : 2021-09-28 04:16:33
Pool Counter              : 1
Backend PID               : 30795
Connected                 : 0
PID                       : 30750
Backend ID                : 0
Status                    : Wait for connection

Database                  : test
Username                  : postgres
Start time                : 2021-09-28 04:16:00 (2:52 before process restarting)
Client connection count   : 1
Major                     : 3
Minor                     : 0
Backend connection time   : 2021-09-28 04:16:16
Client connection time    : 2021-09-28 04:16:16
Client idle duration      : 0
Client disconnection time : 2021-09-28 04:16:33
Pool Counter              : 1
Backend PID               : 30796
Connected                 : 0
PID                       : 30750
Backend ID                : 1
Status                    : Wait for connection

Database                  : test
Username                  : kawamoto
Start time                : 2021-09-28 04:16:00
Client connection count   : 0
Major                     : 3
Minor                     : 0
Backend connection time   : 2021-09-28 04:16:03
Client connection time    : 2021-09-28 04:16:03
Client idle duration      : 158 (2:22 before client disconnected)
Client disconnection time :
Pool Counter              : 1
Backend PID               : 30763
Connected                 : 1
PID                       : 30751
Backend ID                : 0
Status                    : Idle

Database                  : test
Username                  : kawamoto
Start time                : 2021-09-28 04:16:00
Client connection count   : 0
Major                     : 3
Minor                     : 0
Backend connection time   : 2021-09-28 04:16:03
Client connection time    : 2021-09-28 04:16:03
Client idle duration      : 158 (2:22 before client disconnected)
Client disconnection time :
Pool Counter              : 1
Backend PID               : 30764
Connected                 : 1
PID                       : 30751
Backend ID                : 1
Status                    : Idle

To be continued...

This blog is already too long. I will continue from "Allow to keep quorum even if the watchdog cluster looses enough number of watchdog nodes." in my next blog entry.

Tuesday, August 24, 2021

Failover triggered by PostgreSQL shutdown

 

Photo by brisch27

Ultra fast PostgreSQL down detection

If a user connects to Pgpool-II and one of PostgreSQL servers is stopped by admin, Pgpool-II detects  it and triggers failover. Since this happens before health check detects the server down, the failover process is pretty fast. This feature looks great at a first glance.

How does Pgpool-II implement it?

How is it possible? When PostgreSQL is shutdown by admin, PostgreSQL sends an error, and it has special error code: 57P01. If Pgpool-II receives the error code, Pgpool-II starts the failover process.

Problem is, there's other case when 57P01 is used. When pg_terminate_backend() is used, or PostgreSQL backend process is killed by signal, exactly the same error code is sent from PostgreSQL. This confuses Pgpool-II and it triggers failover in this case too!

Pgpool-II 4.3 will mitigate the problem

To mitigate the problem upcoming Pgpool-II 4.3, supposed to be released in 2021 fall, will have a new parameter "failover_on_backend_shutdown". If this is on, Pgpool-II will behave exactly as it is now. If it's off, 57P01 does not trigger failover any more. Of course you cannot enjoy the quick failover feature in this case. Sorry but there's no way to have both the quick failover feature and and not to be confused by the error code. So choice is up to you.

I hope someday PostgreSQL becomes clever enough to use distinct error codes for the admin shutdown case and the pg_terminate_backend() case.

Friday, June 18, 2021

Promoting specified node in Pgpool-II

 

image by Gerd Altmann from Pixabay

Promoting a standby node

 Pgpool-II manages a streaming replication primary node and multiple standby nodes. Suppose we shutdown the primary node. Pgpool-II detects the event and starts a command called "failover command". The failover command is a user supplied script (usually written in shell or other scripting language), and it choose one of standby nodes to be promoted. Typical script chooses the "next main node", which is the next live node to the former primary node: e.g. if the former primary node is 0, and there are node 1 and 2 standby nodes, node 1 will chosen.

Another way to promote a standby node is to use "pcp_detach_node". The command is one of the control commands for Pgpool-II. pcp_detach_node can "detach" a node so that Pgpool-II  changes the internal status of the specified node to be down (but the node is actually up and running). In this case the failover command is also called and the next node - in the example above node 1 - will be chosen.

Promoting specified node

What if we want to promote node 2, instead of node 1? Actually there had been no way to achieve this until today. The next version of Pgpool-II 4.3, supposed to be released in this winter will allow you do it.

Existing pcp_promote_node command now has extra argument: --switchover or -s.

pcp_promote node --switchover 2

will do followings:

  1.  detach the current primary node. This makes the current primary node to be in "down" status (but the node is actually up and running).
  2.  failover command is triggered. Node 2 is passed to the failover command as the "new main node".
  3.  failover command promotes node 2.
  4.  "follow primary command" runs against node 0 and node 1. Follow primary command is a user supplied script to make a node to be standby node, "following" current primary.
  5. As a result, node 0, 1 are standby following new primary node 2.

Conclusion

Pgpool-II 4.3 will have one of the long awaited features: promoting specified node.  I hope this gives admins more flexible way to manage PostgreSQL clusters.



Sunday, May 30, 2021

Dropping support for version 2 protocol

 Peter H

Upcoming PostgreSQL 14, supposed to be released by the end of 2021, will remove the support for version 2 frontend/backend protocol (in this article I refer it as "v2" or "v2 protocol").

What is  version 2 frontend/backend protocol?

It's a one of communication protocols used between clients and PostgreSQL. For example, to send a query to PostgreSQL, clients first send a letter 'Q' then the actual SQL query string. The letter 'Q' indicates the client wants to send a query to PostgreSQL. In 2003 PostgreSQL 7.4 was released and started to support newer protocol v3. From PostgreSQL 7.4 to PostgreSQL 13, they support both older protocol version 2 and newer version of protocol v3. Since v3 protocol is better than v2, especially in that v3 supports extended query which allows to use prepared statements, the support for v2 is just for  a backward compatibility sake. Since PostgreSQL 7.4 was released in 2003, PostgreSQL keeps the v2 support for 18 years!

Any trouble caused by dropping v2 protocol?

 Good question. If you are using older drivers and the driver use v2 with your application, you may get into a trouble because PostgreSQL 14 (and newer) will not accept your application any more.  For example, if you are using older version of JDBC driver (older than Version 9.4.1210 released in 2016-09-07, "fix: support cases when user-provided queries have 'returning' PR#488 c3d8571" is the commit message which drops the v2 support), the chances are some of the code of the application may choose v2 protocol. If you are using such older drivers, I recommend you to test to connect to PostgreSQL 14 using the application with the driver. As of today, PostgreSQL 14 beta1 is available.

What about Pgpool-II?

Pgpool-II supports both v2 and v3 protocol as of Pgpool-II 4.2. For upcoming Pgpool-II 4.3 which is supposed to be released by the end of 2021, developers have not decided yet the support for v2 will be dropped in 4.3 or not. Dropping v2 support code will remove  non-negligible amount code from Pgpool-II and it makes developers' life a little bit easier. If nobody is using v2 protocol with Pgpool-II, we can happily remove the v2 code. If you are still using v2 protocol with Pgpool-II, please reply back to the mailing list  thread.

 


Monday, April 19, 2021

Visibility with query cache

 

 Photo by Free-Photos

Pgpool-II provides "query cache" feature which speeds up SELECT to database. It stores query results to specified storage (either shared memory or memcached). If same query (more precisely, SELECT query which has identical text) arrives, Pgpool-II returns the cached result without accessing the database. This is very fast because:

  • Query results are fetched from memory, rather than database's (typically on desk) storage
  • Does not use CPU or memory on database server

Is there any pitfall?

Whether a SELECT result is fetched from the database or cache is transparent to client. That is, you don't realize the distinction except the response speed. Ok, sounds great. But is there any pitfall to use the query cache? One is, cache hit rates. Since any modifications to underlying table clears the cache, on a system which involves frequent updates are not suitable for the feature. According to the manual, system with lower than 70% cache hit rates is recommended to use the query cache.

Row visibility rules

Another pitfall you may need to care about is row visibility rules. The rule defines how the table rows are visible to other sessions (so you don't need to worry about this if you are the only user in the database). PostgreSQL provides several transaction isolation levels. The row visibility rule will vary depending on the transaction isolation level used in the transaction. For example, with the read committed isolation level (which is the default), other users will not see the new data of rows updated by your transaction until it gets committed.

 

 However with repeatable read isolation level, other users will not see the new data of rows updated by your transaction even after the transaction gets committed. They are visible only after the user starts a new transaction.


Query cache could break the visibility rule

Pgpool-II's query cache does not follow the visibility rule. The cache is created at a transaction gets committed. The difference is, once cache is created, the cache entry can be seen by any transaction. This is fine with read committed isolation level. But with repeatable read isolation level, it breaks the visibility rule.

Suppose query cache feature is used in repeatable read transaction isolation level. This time, after UPDATE, SELECT is issued to create a cache entry.


 

Unlike previous figure, user's transaction will see the new data once your transaction gets committed because the new data will be registered when the transaction gets committed and the data can been seen by anyone. This could happen in not only repeatable read transaction isolation level, but in serializable transaction isolation level.

This is somewhat similar to the phenomenon you may see while using COPY FREEZE. Suppose the transaction isolation level is repeatable read and you are copying data to table using COPY FREEZE in a transaction. Other transaction which has started before your transaction started will see the copied data once your transaction gets committed. This is because "frozen" rows can be seen by anyone.

Conclusion

Pgpool-II's query cache is a useful feature for read intensive busy system. However this may cause certain read anomaly in repeatable read and serializable transaction isolation level. So please use the feature carefully with those transaction isolation level.

Row pattern recognition feature for PostgreSQL

What is row pattern recognition feature? Row pattern recognition (RPR) is a feature defined in the SQL standard. It allows to search for a s...