PostgreSQL-11.3+etcd+patroni构建高可用数据库集群

你离开我真会死。 提交于 2019-11-29 23:49:21

1 摘要

使用Etcd和Patroni可以构建高可用PostgreSQL集群.

Etcd用于Patroni节点之间共享信息.

Patroni监控本地的PostgreSQL状态。如果主库(Primary)故障,Patroni把一个从库(Standby)拉起来,作为新的主(Primary)数据库. 如果一个故障PostgreSQL被抢救过来了,能够重新自动或手动加入集群。

1.1 关于 etcd

Etcd按照Raft算法和协议开发的,是一个强一致性的、分布式的key-value数据库。它为分布式系统提供了可靠的数据存储访问机制。

只有一个etcd节点被选做Leader, 其它的etcd节点作为Follower.

Etcd里的数据以key标识, 例如可以使用如下数据

        key   = /service/postgresql/leader
        value = postgresql1 

来表示一个PostgreSQL集群中的主库是'postgresql1'.

figure1: an etcd cluster including three etcd nodes 
===================================================

     |---------------|                          |-------------|
     |etcd1<follower>| <----------+-----------> |etcd2<leader>|
     |---------------|            |             |-------------|
                                  |
                                  |
                                  |
                          |-------V-------|                            
                          |etcd3<follower>|                             
                          |---------------|                             

1.2 Etcd、Patroni 和PostgreSQL是如何一起工作的

下面的图中(figure2), 使用三个主机(host1,host2,host3) 构建一个PostgreSQL集群。 每一个主机上都安装部署Etcd, Patroni 和 PostgreSQL。

figure2: a PostgreSQL cluster with 3 hosts, each host having etcd, Partoni and PostgreSQL
=====================================================================================


          .........................................................................
          . <host1>                                                               .
          .     |------------|         |-------------|           |-------------|  .  
   +--<-------->|    etcd1   |<------->|  patroni1   +---------->| postgresql1 |  .
   |      .     |------------|         |-------------|           |-------------|  .
   |      .                                                                       .
   |      .........................................................................
   |
   |
   |
   |      .........................................................................
   |      . <host2>                                                               .
   |      .     |------------|         |-------------|           |-------------|  .  
   +-<--------->|    etcd2   |<------->|  patroni2   +---------->| postgresql2 |  .
   |      .     |------------|         |-------------|           |-------------|  .
   |      .                                                                       .
   |      .........................................................................
   |
   |
   |
   |      ..........................................................................
   |      . <host3>                                                                .
   |      .     |------------|         |-------------|           |-------------|   . 
   +--<-------->|    etcd3   |<------->|  patroni3   +---------->| postgresql3 |   .
          .     |------------|         |-------------|           |-------------|   .
          .                                                                        .
          ..........................................................................

1.2.1 Etcd: 分布式的Key-Value数据库

Etcd1、etcd2、 etcd3作为分布式的Key-Value数据库,被partroni1、 patroni2、 patroni3读/写,用于共享/传递信息。每一个 Patroni都能读/写etcd中的数据.

1.2.2 Paroni: 控制/监控本地的PostgreSQL, 把本地PostgreSQL信息/状态写入etcd。

每一个 Patroni实例监控/控制本地的PostgreSQL,把本地本地PostgreSQL信息/状态写入etcd 。

一个Patroni实例能够通过读取etcd获取外地PostgreSQL的信息/状态.

1.2.3 PostreSQL主节点的选举

Patroni判断本地PostgreSQL是否可以作为Primary库。如果可以,Paroni试图选举本地PostgreSQL作为Primary(Leader) ,选举方式是:把etcd中的某个key(e.g. /service/postgresql/leader) 更新成为本地PostgreSQL的名字(e.g. postgresql1)。

如果多个Paroni同时更改同一个key,只有一个能改成功,然后成为Primary(Leader)。

2 规划测试

2.1 系统/软件/版本

  • CentOS 7.6
  • PostgreSQL 11.3
  • etcd: 3.3.11
  • python: Python 3.7.4
  • Patroni: 1.6.0
  • database user:
    • superuser:postgres
    • replication user: replicator

2.2 主机名称/IP地址规划

NO.  |     IP          | HOSTNAME
-----+-----------------+---------------
 1   |  192.168.56.10  |  host1
 2   |  192.168.56.11  |  host2
 3   |  192.168.56.12  |  host3
[rudi@host1 ~]$ more /etc/hosts
192.168.56.10    host1
192.168.56.11    host2
192.168.56.12    host3

2.3 OS用户/目录/文件

  • Linux user: rudi
  • main folder: /home/rudi
    • Postgres data folder: /home/rudi/pgdata/
    • folder for scripts: /home/rudi/scripts/

3 在所有主机(host1/host2/host3)安装软件/模块

3.1 安装etcd

yum -y install etcd libyaml

3.2 从源代码安装PostgreSQL 11.3

yum -y install flex bison readline-devel zlib-devel
wget https://ftp.postgresql.org/pub/source/v11.3/postgresql-11.3.tar.gz
tar zxvf postgresql-11.3.tar.gz
cd postgresql-11.3/
./configure
make
su
make install

3.3 从源代码安装python3,安装相应的python模块

yum install libffi-devel openssl-devel

wget https://www.python.org/ftp/python/3.7.4/Python-3.7.4.tgz
tar zxvf Python-3.7.4.tgz 
cd Python-3.7.4/
./configure
make
su 
make install
python3 -m pip install --upgrade pip
python3 -m pip install psycopg2_binary
python3 -m pip install patroni[etcd]

4 准备目录/文件

4.1 在所有主机(host1/host2/host3)上创建目录:数据库目录、配置脚本目录

cd ~
mkdir ~/scripts
mkdir ~/pgdata

4.2 创建/编辑etcd运行脚本文件

4.2.1 host1: etcd1.sh

cd ~/scripts
vim etcd1.sh
etcd --name etcdnode1 --initial-advertise-peer-urls http://192.168.56.10:2380 \
  --listen-peer-urls http://192.168.56.10:2380 \
  --listen-client-urls http://192.168.56.10:2379,http://127.0.0.1:2379 \
  --advertise-client-urls http://192.168.56.10:2379 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-cluster etcdnode1=http://192.168.56.10:2380,etcdnode2=http://192.168.56.11:2380,etcdnode3=http://192.168.56.12:2380 \
  --initial-cluster-state new

4.2.2 host2: etcd2.sh

cd ~/scripts
vim etcd2.sh
etcd --name etcdnode2 --initial-advertise-peer-urls http://192.168.56.11:2380 \
--listen-peer-urls http://192.168.56.11:2380 \
--listen-client-urls http://192.168.56.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.56.11:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcdnode1=http://192.168.56.10:2380,etcdnode2=http://192.168.56.11:2380,etcdnode3=http://192.168.56.12:2380 \
--initial-cluster-state new

4.2.3 host3: etcd3.sh

cd ~/scripts
vim etcd3.sh
etcd --name etcdnode3 --initial-advertise-peer-urls http://192.168.56.12:2380 \
--listen-peer-urls http://192.168.56.12:2380 \
--listen-client-urls http://192.168.56.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.56.12:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcdnode1=http://192.168.56.10:2380,etcdnode2=http://192.168.56.11:2380,etcdnode3=http://192.168.56.12:2380 \
--initial-cluster-state new

4.3 创建/编辑Patroni配置文件

4.3.1 host1: postgresql1.yml

cd ~/scripts
vim postgresql1.yml
scope: postgresql
namespace: /service/
name: postgresql1

restapi:
    listen: 192.168.56.10:8008
    connect_address: 192.168.56.10:8008

etcd:
    host: 192.168.56.10:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    initdb:
    - encoding: UTF8
    - data-checksums

    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 192.168.56.10/0 md5
    - host replication replicator 192.168.56.11/0 md5
    - host replication replicator 192.168.56.12/0 md5
    - host all all 0.0.0.0/0 md5

    users:
        admin:
            password: admin
            options:
                - createrole
                - createdb
postgresql:
    listen: 192.168.56.10:15432
    connect_address: 192.168.56.10:15432
    bin_dir: /usr/local/pgsql/bin
    data_dir: /home/rudi/pgdata
    pgpass: /tmp/pgpass1
    authentication:
        replication:
            username: replicator
            password: rep-pass
        superuser:
            username: postgres
            password: secretpassword
    parameters:
        unix_socket_directories: '.'
        synchronous_commit: "on"
        synchronous_standby_names: "*"

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

4.3.2 host2: postgresql2.yml

cd ~/scripts
vim postgresql2.yml
scope: postgresql
namespace: /service/
name: postgresql2

restapi:
    listen: 192.168.56.11:8008
    connect_address: 192.168.56.11:8008

etcd:
    host: 192.168.56.11:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    initdb:
    - encoding: UTF8
    - data-checksums

    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 192.168.56.10/0 md5
    - host replication replicator 192.168.56.11/0 md5
    - host replication replicator 192.168.56.12/0 md5
    - host all all 0.0.0.0/0 md5

    users:
        admin:
            password: admin
            options:
                - createrole
                - createdb

postgresql:
    listen: 192.168.56.11:15432
    connect_address: 192.168.56.11:15432
    bin_dir: /usr/local/pgsql/bin
    data_dir: /home/rudi/pgdata
    pgpass: /tmp/pgpass1
    authentication:
        replication:
            username: replicator
            password: rep-pass
        superuser:
            username: postgres
            password: secretpassword
    parameters:
        unix_socket_directories: '.'
        synchronous_commit: "on"
        synchronous_standby_names: "*"

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

4.3.3 host3: postgresql3.yml

cd ~/scripts
vim postgresql3.yml
scope: postgresql
namespace: /service/
name: postgresql3

restapi:
    listen: 192.168.56.12:8008
    connect_address: 192.168.56.12:8008

etcd:
    host: 192.168.56.12:2379

bootstrap:
    dcs:
        ttl: 30
        loop_wait: 10
        retry_timeout: 10
        maximum_lag_on_failover: 1048576
        postgresql:
            use_pg_rewind: true

    initdb:
    - encoding: UTF8
    - data-checksums

    pg_hba:
    - host replication replicator 127.0.0.1/32 md5
    - host replication replicator 192.168.56.10/0 md5
    - host replication replicator 192.168.56.11/0 md5
    - host replication replicator 192.168.56.12/0 md5
    - host all all 0.0.0.0/0 md5

    users:
        admin:
            password: admin
            options:
                - createrole
                - createdb

postgresql:
    listen: 192.168.56.12:15432
    connect_address: 192.168.56.12:15432
    bin_dir: /usr/local/pgsql/bin
    data_dir: /home/rudi/pgdata
    pgpass: /tmp/pgpass1
    authentication:
        replication:
            username: replicator
            password: rep-pass
        superuser:
            username: postgres
            password: secretpassword
    parameters:
        unix_socket_directories: '.'
        synchronous_commit: "on"
        synchronous_standby_names: "*"

tags:
    nofailover: false
    noloadbalance: false
    clonefrom: false
    nosync: false

5 启动集群

5.1 在所有主机上(host1/host2/host3)关闭/停止防火墙

systemctl stop firewalld
systemctl disable firewalld

5.2 按顺序,启动etcd,在所有主机上(host1/host2/host3)

host1:

cd ~
source ./scripts/etcd1.sh

host2:

cd ~
source ./scripts/etcd2.sh

host3:

cd ~
source ./scripts/etcd3.sh

查看etcd状态信息:

[rudi@host1 ~]$ curl -L http://host2:2379/version
{"etcdserver":"3.3.11","etcdcluster":"3.3.0"}
[rudi@host1 ~]$ curl -L http://host3:2379/version
{"etcdserver":"3.3.11","etcdcluster":"3.3.0"}

[rudi@host1 ~]$ curl -L http://host1:2379/version
{"etcdserver":"3.3.11","etcdcluster":"3.3.0"}

[rudi@host1 scripts]$ etcdctl member list
51b08bf82e03e049: name=etcdnode1 peerURLs=http://192.168.56.10:2380 clientURLs=http://192.168.56.10:2379 isLeader=true
6d36a224cc993604: name=etcdnode2 peerURLs=http://192.168.56.11:2380 clientURLs=http://192.168.56.11:2379 isLeader=false
bb961ca5e3abf011: name=etcdnode3 peerURLs=http://192.168.56.12:2380 clientURLs=http://192.168.56.12:2379 isLeader=false

5.3 启动patroni,在所有主机上(host1/host2/host3)

5.3.1 启动Patroni,在host1上

  • Patroni1 把本地PostgreSQL(postgresql1)的信息写入etcd.
  • Patroni1 监测到数据库目录(/home/rudi/pgdata/)是空的,于是初始化数据库(initdb -D /home/rudi/pgdata)
  • Patroni1 配置本地数据库相关的配置文件,例如:postgresql.conf, pg_hba.conf
  • Patroni1 启动本地数据库(postgresql1): pg_ctl -D /home/rudi/pgdata start
  • Patroni1 把本地数据库(postgresql1)设定为主数据库(Primary)
cd ~
patroni ./scripts/postgresql1.yml

5.3.2 启动Patroni,在host2/host3上

  • Patroni2/Patroni3 基于postgresql1做数据库备份(pg_basebackup),创建各自的本地数据库
  • Patroni2/Patroni3 配置本地数据库相关的配置文件,例如:postgresql.conf, pg_hba.conf
  • Patroni2 启动postgresql2,作为从库(Standby)
  • Patroni3 启动postgresql3,作为从库(Standby)

host2:

cd ~
patroni ./scripts/postgresql2.yml

host3:

cd ~
patroni ./scripts/postgresql3.yml

5.3.3 检查集群状态信息, 通过Patroni接口:

[rudi@host1 ~]$ curl -L http://host1:8008/
{"state": "running", "postmaster_start_time": "2019-09-17 04:15:30.959 EDT", "role": "master", "server_version": 110003, "cluster_unlocked": false, "xlog": {"location": 83886400}, "timeline": 1, "replication": [{"usename": "replicator", "application_name": "postgresql2", "client_addr": "192.168.56.11", "state": "streaming", "sync_state": "sync", "sync_priority": 1}, {"usename": "replicator", "application_name": "postgresql3", "client_addr": "192.168.56.12", "state": "streaming", "sync_state": "potential", "sync_priority": 1}], "database_system_identifier": "6737550116166691859", "patroni": {"version": "1.6.0", "scope": "postgresql"}}
[rudi@host1 ~]$ patronictl version
patronictl version 1.6.0

[rudi@host1 ~]$ patronictl -c ./scripts/postgresql1.yml list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  1 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  1 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  1 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+
[rudi@host1 ~]$ 
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  1 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  1 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  1 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

5.3.4 检查集群状态信息, 通过etcd接口:

[rudi@host1 scripts]$ etcdctl ls --recursive --sort -p /service
/service/postgresql/
/service/postgresql/config
/service/postgresql/initialize
/service/postgresql/leader
/service/postgresql/members/
/service/postgresql/members/postgresql1
/service/postgresql/members/postgresql2
/service/postgresql/members/postgresql3
/service/postgresql/optime/
/service/postgresql/optime/leader

[rudi@host1 ~]$ etcdctl get /service/postgresql/leader
postgresql1

[rudi@host1 ~]$ etcdctl get /service/postgresql/members/postgresql1
{"conn_url":"postgres://192.168.56.10:15432/postgres","api_url":"http://192.168.56.10:8008/patroni","state":"running","role":"master","version":"1.6.0","xlog_location":83888056,"timeline":1}

6 实验数据读写

通过任意一台主机( host1,host2, host3)访问数据库

6.1 尝试向主库(Primary)写数据,并读取数据:

export PATH=$PATH:/usr/local/pgsql/bin
psql -U postgres -d postgres -p 15432 -h host1 
create table test (id int, name varchar(100));
postgres=# create table test (id int, name varchar(100));
CREATE TABLE
postgres=# insert into test values ( 1,'1');
INSERT 0 1
postgres=# select * from test;
 id | name 
----+------
  1 | 1
(1 row)

6.2 尝试向一个从库(Standby)写数据

psql -U postgres -d postgres -p 15432 -h  host2
postgres=# insert into test values ( 1,'1');
ERROR:  cannot execute INSERT in a read-only transaction

6.3 尝试从一个从库(Standby)读数据

psql -U postgres -d postgres -p 15432 -h  host3
postgres=# select * from test;
 id | name 
----+------
  1 | 1
(1 row)

7 Kill主库(Primary)上的postmater进程

7.1 Kill之前的状态信息:

  • 主库是postgresql1/host1
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  1 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  1 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  1 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

7.2 执行Kill,在host1上:

[rudi@host1 ~]$ ps -ef|grep postgres
rudi      3908  3759  0 11:35 pts/5    00:00:01 /usr/local/bin/python3 /usr/local/bin/patroni ./scripts/postgresql1.yml
rudi      3929     1  0 11:35 ?        00:00:00 /usr/local/pgsql/bin/postgres -D /home/rudi/pgdata --config-file=/home/rudi/pgdata/postgresql.conf --listen_addresses=192.168.56.10 --port=15432 --cluster_name=postgresql --wal_level=replica --hot_standby=on --max_connections=100 --max_wal_senders=10 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp=off --max_replication_slots=10 --max_worker_processes=8 --wal_log_hints=on
rudi      3935  3929  0 11:35 ?        00:00:00 postgres: postgresql: checkpointer
rudi      3936  3929  0 11:35 ?        00:00:00 postgres: postgresql: background writer
rudi      3937  3929  0 11:35 ?        00:00:00 postgres: postgresql: walwriter
rudi      3938  3929  0 11:35 ?        00:00:00 postgres: postgresql: autovacuum launcher
rudi      3939  3929  0 11:35 ?        00:00:00 postgres: postgresql: stats collector
rudi      3940  3929  0 11:35 ?        00:00:00 postgres: postgresql: logical replication launcher
rudi      3944  3929  0 11:35 ?        00:00:00 postgres: postgresql: postgres postgres 192.168.56.10(44044) idle
rudi      3954  3929  0 11:35 ?        00:00:00 postgres: postgresql: walsender replicator 192.168.56.11(42620) streaming 0/4019F60
rudi      3958  3929  0 11:35 ?        00:00:00 postgres: postgresql: walsender replicator 192.168.56.12(46540) streaming 0/4019F60

[rudi@host1 ~]$ kill -9 3929
[rudi@host1 ~]$ 

7.3 在host1上,Patroni再次启动postgresql1,postgresql1依然是主库

  • postgresql1正在其中中is starting
  • postgresql1依然是主库,没有切换
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |    |   unknown |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  2 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  2 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

7.4 查看主库上的PID, 所有进程都是新的PID:

[rudi@host1 ~]$ ps -ef|grep postgres
rudi      3908  3759  0 11:35 pts/5    00:00:01 /usr/local/bin/python3 /usr/local/bin/patroni ./scripts/postgresql1.yml
rudi      4034     1  0 11:46 ?        00:00:00 /usr/local/pgsql/bin/postgres -D /home/rudi/pgdata --config-file=/home/rudi/pgda
ta/postgresql.conf --listen_addresses=192.168.56.10 --port=15432 --cluster_name=postgresql --wal_level=replica --hot_standby=on 
--max_connections=100 --max_wal_senders=10 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp
=off --max_replication_slots=10 --max_worker_processes=8 --wal_log_hints=on
rudi      4037  4034  0 11:46 ?        00:00:00 postgres: postgresql: checkpointer
rudi      4038  4034  0 11:46 ?        00:00:00 postgres: postgresql: background writer   
rudi      4039  4034  0 11:46 ?        00:00:00 postgres: postgresql: stats collector   
rudi      4044  4034  0 11:46 ?        00:00:00 postgres: postgresql: postgres postgres 192.168.56.10(44742) idle
rudi      4049  4034  0 11:46 ?        00:00:00 postgres: postgresql: walwriter   
rudi      4050  4034  0 11:46 ?        00:00:00 postgres: postgresql: autovacuum launcher   
rudi      4051  4034  0 11:46 ?        00:00:00 postgres: postgresql: logical replication launcher   
rudi      4054  4034  0 11:46 ?        00:00:00 postgres: postgresql: walsender replicator 192.168.56.11(43266) streaming 0/50001A8
rudi      4055  4034  0 11:46 ?        00:00:00 postgres: postgresql: walsender replicator 192.168.56.12(47174) streaming 0/50001A8

7.5 查看集群信息, postgresql1是主库,正常工作

[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  3 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  3 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  3 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

8 手工切换(switchover)

8.1 切换之前的状态信息

[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  3 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  3 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  3 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

8.2 执行手工切换(switchover)

  • 当前的主(Primary)是: postgresql1/host1
  • 选择新的主(Primary): postgresql3/host3
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 switchover postgresql
Master [postgresql1]: 
Candidate ['postgresql2', 'postgresql3'] []: postgresql3
When should the switchover take place (e.g. 2019-09-17T12:53 )  [now]: 
Current cluster topology
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  3 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  3 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  3 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+
Are you sure you want to switchover cluster postgresql, demoting current master postgresql1? [y/N]: y
2019-09-17 11:53:12.19439 Successfully switched over to "postgresql3"
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 |        | stopped |    |   unknown |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  3 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | Leader | running |  3 |           |
+------------+---------+---------------------+--------+---------+----+-----------+

8.3 持续查看集群状态信息:

  • 新的主库(Primary)是:postgresql3/host3
  • Patroni重新启动了postgresql1/host1
  • 最后, postgresql1/host1 作为从库(Standby)重新加入集群,正常工作
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 |        | stopped |    |   unknown |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  4 |           |
| postgresql | postgresql3 | 192.168.56.12:15432 | Leader | running |  4 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 |        | running |  4 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  4 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | Leader | running |  4 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

9 重启动主机,主库(Primay)所在的主机

9.1 重启动之前的集群信息

[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 |        | running |  4 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  4 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | Leader | running |  4 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

9.2 重启动host3(Primary database)

[root@host3 ~]# reboot
Connection to host3 closed by remote host.
Connection to host3 closed.

9.3 查看集群状态信息

  • postgresql3/host3停止了
  • postgresql1/host1成为了主库(Primary)
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  5 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  5 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | stopped |    |   unknown |
+------------+---------+---------------------+--------+---------+----+-----------+

9.4 当host3启动后,手工启动etcd3,手工启动postgresql3

[rudi@host3 ~]source ./scripts/etcd3.sh
[rudi@host3 ~]patroni ./scripts/postgresql3.yml

9.5 当etcd3/postgresql3启动后,查看集群状态信息

  • postgresql3/host3成为从库(Standby),正常工作
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  5 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  5 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  5 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

10 重启动从库(Standby)主机

10.1 重启之前的集群信息

  • 从库:postgresql2,postgresql3
  • 主库:postgresql1
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  5 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  5 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  5 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

10.2 重启动host2(Standby)

[root@host2 ~]# reboot
Connection to host2 closed by remote host.
Connection to host2 closed.

10.3 查看集群信息

  • postgresql2 已经停止
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  5 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | stopped |    |   unknown |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  5 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+

10.4 当host2启动完毕后, 按照先后顺序,手动启动etcd2和postgresql2

[rudi@host2 ~]source ./scripts/etcd2.sh
[rudi@host2 ~]patroni ./scripts/postgresql2.yml

10.5 当etcd2和postgresql2启动完毕后,查看集群状态

  • postgresql2仍然是从库,正常工作
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
|  Cluster   |  Member |         Host        |  Role  |  State  | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running |  5 |         0 |
| postgresql | postgresql2 | 192.168.56.11:15432 |        | running |  5 |         0 |
| postgresql | postgresql3 | 192.168.56.12:15432 |        | running |  5 |         0 |
+------------+---------+---------------------+--------+---------+----+-----------+
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!