Pgpool-II: Delegated IP is not available when disconnected Primary or Standby Node Failed

问题

I'm trying to setup postgres cluster of two nodes (primary and standby). In order to activate automatic failover, I'm using pgpool-II.

I followed the following article: https://www.pgpool.net/docs/41/en/html/example-cluster.html and the only thing difference I did is installing postgresql version 12 instead of version 11.

Knowing that I'm trying it useing two centos7 images on Proxmox. I faced the following issues:

When I run systemctl status pgpool.service on both nodes, it returned success. Also I can access postgresql using the watchdog delegate IP.

But what testing failover, everything goes wrong.

As soon as I stop one of the servers, the delegated IP stops responding. As a result, databases are unavailable. As soon as I start another node, the delegated un becomes available.

##############Log node 1

STOP

db0 pgpool[44615]: [1-1] 2020-05-11 23:31:55: pid 44615: LOG:  stop request sent to pgpool. waiting for termination...
db0 pgpool[44104]: [27-1] 2020-05-11 23:31:55: pid 44104: LOG:  Watchdog is shutting down
db0 pgpool[44616]: [28-1] 2020-05-11 23:31:55: pid 44616: LOG:  watchdog: de-escalation started
db0 pgpool[44616]: [29-1] 2020-05-11 23:31:55: pid 44616: LOG:  successfully released the delegate IP:"172.16.0.151"
db0 pgpool[44616]: [29-2] 2020-05-11 23:31:55: pid 44616: DETAIL:  'if_down_cmd' returned with success

###############Log node 2

STOP NODE1

db0 pgpool[44615]: [1-1] 2020-05-11 23:31:55: pid 44615: LOG:  stop request sent to pgpool. waiting for termination...
db0 pgpool[44104]: [27-1] 2020-05-11 23:31:55: pid 44104: LOG:  Watchdog is shutting down
db0 pgpool[44616]: [28-1] 2020-05-11 23:31:55: pid 44616: LOG:  watchdog: de-escalation started
db0 pgpool[44616]: [29-1] 2020-05-11 23:31:55: pid 44616: LOG:  successfully released the delegate IP:"172.16.0.151"
db0 pgpool[44616]: [29-2] 2020-05-11 23:31:55: pid 44616: DETAIL:  'if_down_cmd' returned with success

##############Log node 1

START

db0 pgpool[44687]: [1-1] 2020-05-11 23:36:17: pid 44687: LOG:  memory cache initialized
db0 pgpool[44687]: [1-2] 2020-05-11 23:36:17: pid 44687: DETAIL:  memcache blocks :64
db0 pgpool[44687]: [2-1] 2020-05-11 23:36:17: pid 44687: LOG:  pool_discard_oid_maps: discarded memqcache oid maps
db0 pgpool[44687]: [3-1] 2020-05-11 23:36:17: pid 44687: LOG:  waiting for watchdog to initialize
db0 pgpool[44689]: [3-1] 2020-05-11 23:36:17: pid 44689: LOG:  setting the local watchdog node name to "db0:9999 Linux db0"
db0 pgpool[44689]: [4-1] 2020-05-11 23:36:17: pid 44689: LOG:  watchdog cluster is configured with 1 remote nodes
db0 pgpool[44689]: [5-1] 2020-05-11 23:36:17: pid 44689: LOG:  watchdog remote node:0 on db1:9000
db0 pgpool[44689]: [6-1] 2020-05-11 23:36:17: pid 44689: LOG:  interface monitoring is disabled in watchdog
db0 pgpool[44689]: [7-1] 2020-05-11 23:36:17: pid 44689: LOG:  watchdog node state changed from [DEAD] to [LOADING]
db0 pgpool[44689]: [8-1] 2020-05-11 23:36:17: pid 44689: LOG:  new outbound connection to db1:9000
db0 pgpool[44689]: [9-1] 2020-05-11 23:36:17: pid 44689: LOG:  setting the remote node "db1:9999 Linux db1" as watchdog cluster master
db0 pgpool[44689]: [10-1] 2020-05-11 23:36:17: pid 44689: LOG:  watchdog node state changed from [LOADING] to [INITIALIZING]
db0 pgpool[44689]: [11-1] 2020-05-11 23:36:17: pid 44689: LOG:  new watchdog node connection is received from "172.16.0.152:30404"
db0 pgpool[44689]: [12-1] 2020-05-11 23:36:17: pid 44689: LOG:  new node joined the cluster hostname:"db1" port:9000 pgpool_port:9999
db0 pgpool[44689]: [12-2] 2020-05-11 23:36:17: pid 44689: DETAIL:  Pgpool-II version:"4.1.1" watchdog messaging version: 1.1
db0 pgpool[44689]: [13-1] 2020-05-11 23:36:18: pid 44689: LOG:  watchdog node state changed from [INITIALIZING] to [STANDBY]
db0 pgpool[44689]: [14-1] 2020-05-11 23:36:18: pid 44689: LOG:  successfully joined the watchdog cluster as standby node
db0 pgpool[44689]: [14-2] 2020-05-11 23:36:18: pid 44689: DETAIL:  our join coordinator request is accepted by cluster leader node "db1:9999 Linux db1"
db0 pgpool[44687]: [4-1] 2020-05-11 23:36:18: pid 44687: LOG:  watchdog process is initialized
db0 pgpool[44687]: [4-2] 2020-05-11 23:36:18: pid 44687: DETAIL:  watchdog messaging data version: 1.1
db0 pgpool[44689]: [15-1] 2020-05-11 23:36:18: pid 44689: LOG:  new IPC connection received
db0 pgpool[44689]: [16-1] 2020-05-11 23:36:18: pid 44689: LOG:  new IPC connection received
db0 pgpool[44687]: [5-1] 2020-05-11 23:36:18: pid 44687: LOG:  we have joined the watchdog cluster as STANDBY node
db0 pgpool[44687]: [5-2] 2020-05-11 23:36:18: pid 44687: DETAIL:  syncing the backend states from the MASTER watchdog node
db0 pgpool[44690]: [5-1] 2020-05-11 23:36:18: pid 44690: LOG:  2 watchdog nodes are configured for lifecheck
db0 pgpool[44689]: [17-1] 2020-05-11 23:36:18: pid 44689: LOG:  new IPC connection received
db0 pgpool[44690]: [6-1] 2020-05-11 23:36:18: pid 44690: LOG:  watchdog nodes ID:0 Name:"db0:9999 Linux db0"
db0 pgpool[44690]: [6-2] 2020-05-11 23:36:18: pid 44690: DETAIL:  Host:"db0" WD Port:9000 pgpool-II port:9999
db0 pgpool[44690]: [7-1] 2020-05-11 23:36:18: pid 44690: LOG:  watchdog nodes ID:1 Name:"db1:9999 Linux db1"
db0 pgpool[44690]: [7-2] 2020-05-11 23:36:18: pid 44690: DETAIL:  Host:"db1" WD Port:9000 pgpool-II port:9999
db0 pgpool[44689]: [18-1] 2020-05-11 23:36:18: pid 44689: LOG:  received the get data request from local pgpool-II on IPC interface
db0 pgpool[44689]: [19-1] 2020-05-11 23:36:18: pid 44689: LOG:  get data request from local pgpool-II node received on IPC interface is forwarded to master watchdog node "db1:9999 Linux db1"
db0 pgpool[44689]: [19-2] 2020-05-11 23:36:18: pid 44689: DETAIL:  waiting for the reply...
db0 pgpool[44687]: [6-1] 2020-05-11 23:36:18: pid 44687: LOG:  master watchdog node "db1:9999 Linux db1" returned status for 2 backend nodes
db0 pgpool[44687]: [7-1] 2020-05-11 23:36:18: pid 44687: LOG:  backend:0 is set to UP status
db0 pgpool[44687]: [7-2] 2020-05-11 23:36:18: pid 44687: DETAIL:  backend:0 is UP on cluster master "db1:9999 Linux db1"
db0 pgpool[44687]: [8-1] 2020-05-11 23:36:18: pid 44687: LOG:  backend:1 is set to UP status
db0 pgpool[44687]: [8-2] 2020-05-11 23:36:18: pid 44687: DETAIL:  backend:1 is UP on cluster master "db1:9999 Linux db1"
db0 pgpool[44687]: [9-1] 2020-05-11 23:36:18: pid 44687: LOG:  Setting up socket for 0.0.0.0:9999
db0 pgpool[44687]: [10-1] 2020-05-11 23:36:18: pid 44687: LOG:  Setting up socket for :::9999
db0 pgpool[44725]: [11-1] 2020-05-11 23:36:18: pid 44725: LOG:  PCP process: 44725 started
db0 pgpool[44687]: [11-1] 2020-05-11 23:36:18: pid 44687: LOG:  pgpool-II successfully started. version 4.1.1 (karasukiboshi)

###############Log node 2

START NODE1

db1 pgpool[30154]: [39-1] 2020-05-11 23:36:17: pid 30154: LOG:  new watchdog node connection is received from "172.16.0.153:61085"
db1 pgpool[30154]: [40-1] 2020-05-11 23:36:17: pid 30154: LOG:  new node joined the cluster hostname:"db0" port:9000 pgpool_port:9999
db1 pgpool[30154]: [40-2] 2020-05-11 23:36:17: pid 30154: DETAIL:  Pgpool-II version:"4.1.1" watchdog messaging version: 1.1
db1 pgpool[30154]: [41-1] 2020-05-11 23:36:17: pid 30154: LOG:  The newly joined node:"db0:9999 Linux db0" had left the cluster because it was shutdown
db1 pgpool[30154]: [42-1] 2020-05-11 23:36:17: pid 30154: LOG:  new outbound connection to db0:9000
db1 pgpool[30154]: [43-1] 2020-05-11 23:36:18: pid 30154: LOG:  adding watchdog node "db0:9999 Linux db0" to the standby list
db1 pgpool[30154]: [44-1] 2020-05-11 23:36:18: pid 30154: LOG:  quorum found
db1 pgpool[30154]: [44-2] 2020-05-11 23:36:18: pid 30154: DETAIL:  starting escalation process
db1 pgpool[30154]: [45-1] 2020-05-11 23:36:18: pid 30154: LOG:  escalation process started with PID:30601
db1 pgpool[30601]: [45-1] 2020-05-11 23:36:18: pid 30601: LOG:  watchdog: escalation started
db1 pgpool[30152]: [14-1] 2020-05-11 23:36:18: pid 30152: LOG:  Pgpool-II parent process received watchdog quorum change signal from watchdog
db1 pgpool[30154]: [46-1] 2020-05-11 23:36:18: pid 30154: LOG:  new IPC connection received
db1 pgpool[30152]: [15-1] 2020-05-11 23:36:18: pid 30152: LOG:  watchdog cluster now holds the quorum
db1 pgpool[30152]: [15-2] 2020-05-11 23:36:18: pid 30152: DETAIL:  updating the state of quarantine backend nodes
db1 pgpool[30154]: [47-1] 2020-05-11 23:36:18: pid 30154: LOG:  new IPC connection received
db1 pgpool[30601]: [46-1] 2020-05-11 23:36:20: pid 30601: WARNING:  watchdog failed to ping host"172.16.0.151"
db1 pgpool[30601]: [46-2] 2020-05-11 23:36:20: pid 30601: DETAIL:  ping process exits with code: 2
db1 pgpool[30601]: [47-1] 2020-05-11 23:36:20: pid 30601: LOG:  waiting for the delegate IP address to become active
db1 pgpool[30601]: [47-2] 2020-05-11 23:36:20: pid 30601: DETAIL:  waiting... count: 1
db1 pgpool[30601]: [48-1] 2020-05-11 23:36:20: pid 30601: WARNING:  watchdog failed to ping host"172.16.0.151"
db1 pgpool[30601]: [48-2] 2020-05-11 23:36:20: pid 30601: DETAIL:  ping process exits with code: 2
db1 pgpool[30601]: [49-1] 2020-05-11 23:36:20: pid 30601: LOG:  waiting for the delegate IP address to become active
db1 pgpool[30601]: [49-2] 2020-05-11 23:36:20: pid 30601: DETAIL:  waiting... count: 2
db1 pgpool[30601]: [50-1] 2020-05-11 23:36:20: pid 30601: WARNING:  watchdog failed to ping host"172.16.0.151"
db1 pgpool[30601]: [50-2] 2020-05-11 23:36:20: pid 30601: DETAIL:  ping process exits with code: 2
db1 pgpool[30601]: [51-1] 2020-05-11 23:36:20: pid 30601: LOG:  waiting for the delegate IP address to become active
db1 pgpool[30601]: [51-2] 2020-05-11 23:36:20: pid 30601: DETAIL:  waiting... count: 3
db1 pgpool[30601]: [52-1] 2020-05-11 23:36:20: pid 30601: LOG:  failed to acquire the delegate IP address
db1 pgpool[30601]: [52-2] 2020-05-11 23:36:20: pid 30601: DETAIL:  'if_up_cmd' failed
db1 pgpool[30601]: [53-1] 2020-05-11 23:36:20: pid 30601: WARNING:  watchdog escalation failed to acquire delegate IP
db1 pgpool[30154]: [48-1] 2020-05-11 23:36:20: pid 30154: LOG:  watchdog escalation process with pid: 30601 exit with SUCCESS.
db1 pgpool[30157]: [11-1] 2020-05-11 23:36:29: pid 30157: LOG:  informing the node status change to watchdog
db1 pgpool[30157]: [11-2] 2020-05-11 23:36:29: pid 30157: DETAIL:  node id :1 status = "NODE ALIVE" message:"Heartbeat signal found"
db1 pgpool[30154]: [49-1] 2020-05-11 23:36:29: pid 30154: LOG:  new IPC connection received
db1 pgpool[30154]: [50-1] 2020-05-11 23:36:29: pid 30154: LOG:  received node status change ipc message
db1 pgpool[30154]: [50-2] 2020-05-11 23:36:29: pid 30154: DETAIL:  Heartbeat signal found
db1 pgpool[30154]: [51-1] 2020-05-11 23:36:29: pid 30154: LOG:  remote node "db0:9999 Linux db0" became reachable again
db1 pgpool[30154]: [51-2] 2020-05-11 23:36:29: pid 30154: DETAIL:  requesting the node info
db1 pgpool[30154]: [52-1] 2020-05-11 23:36:29: pid 30154: LOG:  remote node "db0:9999 Linux db0" is reachable again
db1 pgpool[30154]: [52-2] 2020-05-11 23:36:29: pid 30154: DETAIL:  trying to add it back as a standby

回答1:

I guess you need to enable this flag enable_consensus_with_half_votes=on See more details here. https://www.pgpool.net/docs/latest/en/html/runtime-watchdog-config.html#GUC-ENABLE-CONSENSUS-WITH-HALF-VOTES

来源：https://stackoverflow.com/questions/61739146/pgpool-ii-delegated-ip-is-not-available-when-disconnected-primary-or-standby-no

标签

postgresql

high-availability

PGpool

failovercluster

virtual-ip-address