1 摘要
使用Etcd和Patroni可以构建高可用PostgreSQL集群.
Etcd用于Patroni节点之间共享信息.
Patroni监控本地的PostgreSQL状态。如果主库(Primary)故障,Patroni把一个从库(Standby)拉起来,作为新的主(Primary)数据库. 如果一个故障PostgreSQL被抢救过来了,能够重新自动或手动加入集群。
1.1 关于 etcd
- see https://etcd.io/
Etcd按照Raft算法和协议开发的,是一个强一致性的、分布式的key-value数据库。它为分布式系统提供了可靠的数据存储访问机制。
只有一个etcd节点被选做Leader, 其它的etcd节点作为Follower.
Etcd里的数据以key标识, 例如可以使用如下数据
key = /service/postgresql/leader
value = postgresql1
来表示一个PostgreSQL集群中的主库是'postgresql1'.
figure1: an etcd cluster including three etcd nodes
===================================================
|---------------| |-------------|
|etcd1<follower>| <----------+-----------> |etcd2<leader>|
|---------------| | |-------------|
|
|
|
|-------V-------|
|etcd3<follower>|
|---------------|
1.2 Etcd、Patroni 和PostgreSQL是如何一起工作的
下面的图中(figure2), 使用三个主机(host1,host2,host3) 构建一个PostgreSQL集群。 每一个主机上都安装部署Etcd, Patroni 和 PostgreSQL。
figure2: a PostgreSQL cluster with 3 hosts, each host having etcd, Partoni and PostgreSQL
=====================================================================================
.........................................................................
. <host1> .
. |------------| |-------------| |-------------| .
+--<-------->| etcd1 |<------->| patroni1 +---------->| postgresql1 | .
| . |------------| |-------------| |-------------| .
| . .
| .........................................................................
|
|
|
| .........................................................................
| . <host2> .
| . |------------| |-------------| |-------------| .
+-<--------->| etcd2 |<------->| patroni2 +---------->| postgresql2 | .
| . |------------| |-------------| |-------------| .
| . .
| .........................................................................
|
|
|
| ..........................................................................
| . <host3> .
| . |------------| |-------------| |-------------| .
+--<-------->| etcd3 |<------->| patroni3 +---------->| postgresql3 | .
. |------------| |-------------| |-------------| .
. .
..........................................................................
1.2.1 Etcd: 分布式的Key-Value数据库
Etcd1、etcd2、 etcd3作为分布式的Key-Value数据库,被partroni1、 patroni2、 patroni3读/写,用于共享/传递信息。每一个 Patroni都能读/写etcd中的数据.
1.2.2 Paroni: 控制/监控本地的PostgreSQL, 把本地PostgreSQL信息/状态写入etcd。
每一个 Patroni实例监控/控制本地的PostgreSQL,把本地本地PostgreSQL信息/状态写入etcd 。
一个Patroni实例能够通过读取etcd获取外地PostgreSQL的信息/状态.
1.2.3 PostreSQL主节点的选举
Patroni判断本地PostgreSQL是否可以作为Primary库。如果可以,Paroni试图选举本地PostgreSQL作为Primary(Leader) ,选举方式是:把etcd中的某个key(e.g. /service/postgresql/leader) 更新成为本地PostgreSQL的名字(e.g. postgresql1)。
如果多个Paroni同时更改同一个key,只有一个能改成功,然后成为Primary(Leader)。
2 规划测试
2.1 系统/软件/版本
- CentOS 7.6
- PostgreSQL 11.3
- etcd: 3.3.11
- python: Python 3.7.4
- Patroni: 1.6.0
- database user:
- superuser:postgres
- replication user: replicator
2.2 主机名称/IP地址规划
NO. | IP | HOSTNAME
-----+-----------------+---------------
1 | 192.168.56.10 | host1
2 | 192.168.56.11 | host2
3 | 192.168.56.12 | host3
[rudi@host1 ~]$ more /etc/hosts
192.168.56.10 host1
192.168.56.11 host2
192.168.56.12 host3
2.3 OS用户/目录/文件
- Linux user: rudi
- main folder: /home/rudi
- Postgres data folder: /home/rudi/pgdata/
- folder for scripts: /home/rudi/scripts/
3 在所有主机(host1/host2/host3)安装软件/模块
3.1 安装etcd
yum -y install etcd libyaml
3.2 从源代码安装PostgreSQL 11.3
yum -y install flex bison readline-devel zlib-devel
wget https://ftp.postgresql.org/pub/source/v11.3/postgresql-11.3.tar.gz
tar zxvf postgresql-11.3.tar.gz
cd postgresql-11.3/
./configure
make
su
make install
3.3 从源代码安装python3,安装相应的python模块
yum install libffi-devel openssl-devel
wget https://www.python.org/ftp/python/3.7.4/Python-3.7.4.tgz
tar zxvf Python-3.7.4.tgz
cd Python-3.7.4/
./configure
make
su
make install
python3 -m pip install --upgrade pip
python3 -m pip install psycopg2_binary
python3 -m pip install patroni[etcd]
4 准备目录/文件
4.1 在所有主机(host1/host2/host3)上创建目录:数据库目录、配置脚本目录
cd ~
mkdir ~/scripts
mkdir ~/pgdata
4.2 创建/编辑etcd运行脚本文件
4.2.1 host1: etcd1.sh
cd ~/scripts
vim etcd1.sh
etcd --name etcdnode1 --initial-advertise-peer-urls http://192.168.56.10:2380 \
--listen-peer-urls http://192.168.56.10:2380 \
--listen-client-urls http://192.168.56.10:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.56.10:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcdnode1=http://192.168.56.10:2380,etcdnode2=http://192.168.56.11:2380,etcdnode3=http://192.168.56.12:2380 \
--initial-cluster-state new
4.2.2 host2: etcd2.sh
cd ~/scripts
vim etcd2.sh
etcd --name etcdnode2 --initial-advertise-peer-urls http://192.168.56.11:2380 \
--listen-peer-urls http://192.168.56.11:2380 \
--listen-client-urls http://192.168.56.11:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.56.11:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcdnode1=http://192.168.56.10:2380,etcdnode2=http://192.168.56.11:2380,etcdnode3=http://192.168.56.12:2380 \
--initial-cluster-state new
4.2.3 host3: etcd3.sh
cd ~/scripts
vim etcd3.sh
etcd --name etcdnode3 --initial-advertise-peer-urls http://192.168.56.12:2380 \
--listen-peer-urls http://192.168.56.12:2380 \
--listen-client-urls http://192.168.56.12:2379,http://127.0.0.1:2379 \
--advertise-client-urls http://192.168.56.12:2379 \
--initial-cluster-token etcd-cluster-1 \
--initial-cluster etcdnode1=http://192.168.56.10:2380,etcdnode2=http://192.168.56.11:2380,etcdnode3=http://192.168.56.12:2380 \
--initial-cluster-state new
4.3 创建/编辑Patroni配置文件
4.3.1 host1: postgresql1.yml
cd ~/scripts
vim postgresql1.yml
scope: postgresql
namespace: /service/
name: postgresql1
restapi:
listen: 192.168.56.10:8008
connect_address: 192.168.56.10:8008
etcd:
host: 192.168.56.10:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.56.10/0 md5
- host replication replicator 192.168.56.11/0 md5
- host replication replicator 192.168.56.12/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.56.10:15432
connect_address: 192.168.56.10:15432
bin_dir: /usr/local/pgsql/bin
data_dir: /home/rudi/pgdata
pgpass: /tmp/pgpass1
authentication:
replication:
username: replicator
password: rep-pass
superuser:
username: postgres
password: secretpassword
parameters:
unix_socket_directories: '.'
synchronous_commit: "on"
synchronous_standby_names: "*"
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
4.3.2 host2: postgresql2.yml
cd ~/scripts
vim postgresql2.yml
scope: postgresql
namespace: /service/
name: postgresql2
restapi:
listen: 192.168.56.11:8008
connect_address: 192.168.56.11:8008
etcd:
host: 192.168.56.11:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.56.10/0 md5
- host replication replicator 192.168.56.11/0 md5
- host replication replicator 192.168.56.12/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.56.11:15432
connect_address: 192.168.56.11:15432
bin_dir: /usr/local/pgsql/bin
data_dir: /home/rudi/pgdata
pgpass: /tmp/pgpass1
authentication:
replication:
username: replicator
password: rep-pass
superuser:
username: postgres
password: secretpassword
parameters:
unix_socket_directories: '.'
synchronous_commit: "on"
synchronous_standby_names: "*"
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
4.3.3 host3: postgresql3.yml
cd ~/scripts
vim postgresql3.yml
scope: postgresql
namespace: /service/
name: postgresql3
restapi:
listen: 192.168.56.12:8008
connect_address: 192.168.56.12:8008
etcd:
host: 192.168.56.12:2379
bootstrap:
dcs:
ttl: 30
loop_wait: 10
retry_timeout: 10
maximum_lag_on_failover: 1048576
postgresql:
use_pg_rewind: true
initdb:
- encoding: UTF8
- data-checksums
pg_hba:
- host replication replicator 127.0.0.1/32 md5
- host replication replicator 192.168.56.10/0 md5
- host replication replicator 192.168.56.11/0 md5
- host replication replicator 192.168.56.12/0 md5
- host all all 0.0.0.0/0 md5
users:
admin:
password: admin
options:
- createrole
- createdb
postgresql:
listen: 192.168.56.12:15432
connect_address: 192.168.56.12:15432
bin_dir: /usr/local/pgsql/bin
data_dir: /home/rudi/pgdata
pgpass: /tmp/pgpass1
authentication:
replication:
username: replicator
password: rep-pass
superuser:
username: postgres
password: secretpassword
parameters:
unix_socket_directories: '.'
synchronous_commit: "on"
synchronous_standby_names: "*"
tags:
nofailover: false
noloadbalance: false
clonefrom: false
nosync: false
5 启动集群
5.1 在所有主机上(host1/host2/host3)关闭/停止防火墙
systemctl stop firewalld
systemctl disable firewalld
5.2 按顺序,启动etcd,在所有主机上(host1/host2/host3)
host1:
cd ~
source ./scripts/etcd1.sh
host2:
cd ~
source ./scripts/etcd2.sh
host3:
cd ~
source ./scripts/etcd3.sh
查看etcd状态信息:
[rudi@host1 ~]$ curl -L http://host2:2379/version
{"etcdserver":"3.3.11","etcdcluster":"3.3.0"}
[rudi@host1 ~]$ curl -L http://host3:2379/version
{"etcdserver":"3.3.11","etcdcluster":"3.3.0"}
[rudi@host1 ~]$ curl -L http://host1:2379/version
{"etcdserver":"3.3.11","etcdcluster":"3.3.0"}
[rudi@host1 scripts]$ etcdctl member list
51b08bf82e03e049: name=etcdnode1 peerURLs=http://192.168.56.10:2380 clientURLs=http://192.168.56.10:2379 isLeader=true
6d36a224cc993604: name=etcdnode2 peerURLs=http://192.168.56.11:2380 clientURLs=http://192.168.56.11:2379 isLeader=false
bb961ca5e3abf011: name=etcdnode3 peerURLs=http://192.168.56.12:2380 clientURLs=http://192.168.56.12:2379 isLeader=false
5.3 启动patroni,在所有主机上(host1/host2/host3)
5.3.1 启动Patroni,在host1上
- Patroni1 把本地PostgreSQL(postgresql1)的信息写入etcd.
- Patroni1 监测到数据库目录(/home/rudi/pgdata/)是空的,于是初始化数据库(initdb -D /home/rudi/pgdata)
- Patroni1 配置本地数据库相关的配置文件,例如:postgresql.conf, pg_hba.conf
- Patroni1 启动本地数据库(postgresql1): pg_ctl -D /home/rudi/pgdata start
- Patroni1 把本地数据库(postgresql1)设定为主数据库(Primary)
cd ~
patroni ./scripts/postgresql1.yml
5.3.2 启动Patroni,在host2/host3上
- Patroni2/Patroni3 基于postgresql1做数据库备份(pg_basebackup),创建各自的本地数据库
- Patroni2/Patroni3 配置本地数据库相关的配置文件,例如:postgresql.conf, pg_hba.conf
- Patroni2 启动postgresql2,作为从库(Standby)
- Patroni3 启动postgresql3,作为从库(Standby)
host2:
cd ~
patroni ./scripts/postgresql2.yml
host3:
cd ~
patroni ./scripts/postgresql3.yml
5.3.3 检查集群状态信息, 通过Patroni接口:
[rudi@host1 ~]$ curl -L http://host1:8008/
{"state": "running", "postmaster_start_time": "2019-09-17 04:15:30.959 EDT", "role": "master", "server_version": 110003, "cluster_unlocked": false, "xlog": {"location": 83886400}, "timeline": 1, "replication": [{"usename": "replicator", "application_name": "postgresql2", "client_addr": "192.168.56.11", "state": "streaming", "sync_state": "sync", "sync_priority": 1}, {"usename": "replicator", "application_name": "postgresql3", "client_addr": "192.168.56.12", "state": "streaming", "sync_state": "potential", "sync_priority": 1}], "database_system_identifier": "6737550116166691859", "patroni": {"version": "1.6.0", "scope": "postgresql"}}
[rudi@host1 ~]$ patronictl version
patronictl version 1.6.0
[rudi@host1 ~]$ patronictl -c ./scripts/postgresql1.yml list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 1 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 1 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 1 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
[rudi@host1 ~]$
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 1 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 1 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 1 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
5.3.4 检查集群状态信息, 通过etcd接口:
[rudi@host1 scripts]$ etcdctl ls --recursive --sort -p /service
/service/postgresql/
/service/postgresql/config
/service/postgresql/initialize
/service/postgresql/leader
/service/postgresql/members/
/service/postgresql/members/postgresql1
/service/postgresql/members/postgresql2
/service/postgresql/members/postgresql3
/service/postgresql/optime/
/service/postgresql/optime/leader
[rudi@host1 ~]$ etcdctl get /service/postgresql/leader
postgresql1
[rudi@host1 ~]$ etcdctl get /service/postgresql/members/postgresql1
{"conn_url":"postgres://192.168.56.10:15432/postgres","api_url":"http://192.168.56.10:8008/patroni","state":"running","role":"master","version":"1.6.0","xlog_location":83888056,"timeline":1}
6 实验数据读写
通过任意一台主机( host1,host2, host3)访问数据库
6.1 尝试向主库(Primary)写数据,并读取数据:
export PATH=$PATH:/usr/local/pgsql/bin
psql -U postgres -d postgres -p 15432 -h host1
create table test (id int, name varchar(100));
postgres=# create table test (id int, name varchar(100));
CREATE TABLE
postgres=# insert into test values ( 1,'1');
INSERT 0 1
postgres=# select * from test;
id | name
----+------
1 | 1
(1 row)
6.2 尝试向一个从库(Standby)写数据
psql -U postgres -d postgres -p 15432 -h host2
postgres=# insert into test values ( 1,'1');
ERROR: cannot execute INSERT in a read-only transaction
6.3 尝试从一个从库(Standby)读数据
psql -U postgres -d postgres -p 15432 -h host3
postgres=# select * from test;
id | name
----+------
1 | 1
(1 row)
7 Kill主库(Primary)上的postmater进程
7.1 Kill之前的状态信息:
- 主库是postgresql1/host1
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 1 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 1 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 1 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
7.2 执行Kill,在host1上:
[rudi@host1 ~]$ ps -ef|grep postgres
rudi 3908 3759 0 11:35 pts/5 00:00:01 /usr/local/bin/python3 /usr/local/bin/patroni ./scripts/postgresql1.yml
rudi 3929 1 0 11:35 ? 00:00:00 /usr/local/pgsql/bin/postgres -D /home/rudi/pgdata --config-file=/home/rudi/pgdata/postgresql.conf --listen_addresses=192.168.56.10 --port=15432 --cluster_name=postgresql --wal_level=replica --hot_standby=on --max_connections=100 --max_wal_senders=10 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp=off --max_replication_slots=10 --max_worker_processes=8 --wal_log_hints=on
rudi 3935 3929 0 11:35 ? 00:00:00 postgres: postgresql: checkpointer
rudi 3936 3929 0 11:35 ? 00:00:00 postgres: postgresql: background writer
rudi 3937 3929 0 11:35 ? 00:00:00 postgres: postgresql: walwriter
rudi 3938 3929 0 11:35 ? 00:00:00 postgres: postgresql: autovacuum launcher
rudi 3939 3929 0 11:35 ? 00:00:00 postgres: postgresql: stats collector
rudi 3940 3929 0 11:35 ? 00:00:00 postgres: postgresql: logical replication launcher
rudi 3944 3929 0 11:35 ? 00:00:00 postgres: postgresql: postgres postgres 192.168.56.10(44044) idle
rudi 3954 3929 0 11:35 ? 00:00:00 postgres: postgresql: walsender replicator 192.168.56.11(42620) streaming 0/4019F60
rudi 3958 3929 0 11:35 ? 00:00:00 postgres: postgresql: walsender replicator 192.168.56.12(46540) streaming 0/4019F60
[rudi@host1 ~]$ kill -9 3929
[rudi@host1 ~]$
7.3 在host1上,Patroni再次启动postgresql1,postgresql1依然是主库
- postgresql1正在其中中is starting
- postgresql1依然是主库,没有切换
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | | unknown |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 2 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 2 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
7.4 查看主库上的PID, 所有进程都是新的PID:
[rudi@host1 ~]$ ps -ef|grep postgres
rudi 3908 3759 0 11:35 pts/5 00:00:01 /usr/local/bin/python3 /usr/local/bin/patroni ./scripts/postgresql1.yml
rudi 4034 1 0 11:46 ? 00:00:00 /usr/local/pgsql/bin/postgres -D /home/rudi/pgdata --config-file=/home/rudi/pgda
ta/postgresql.conf --listen_addresses=192.168.56.10 --port=15432 --cluster_name=postgresql --wal_level=replica --hot_standby=on
--max_connections=100 --max_wal_senders=10 --max_prepared_transactions=0 --max_locks_per_transaction=64 --track_commit_timestamp
=off --max_replication_slots=10 --max_worker_processes=8 --wal_log_hints=on
rudi 4037 4034 0 11:46 ? 00:00:00 postgres: postgresql: checkpointer
rudi 4038 4034 0 11:46 ? 00:00:00 postgres: postgresql: background writer
rudi 4039 4034 0 11:46 ? 00:00:00 postgres: postgresql: stats collector
rudi 4044 4034 0 11:46 ? 00:00:00 postgres: postgresql: postgres postgres 192.168.56.10(44742) idle
rudi 4049 4034 0 11:46 ? 00:00:00 postgres: postgresql: walwriter
rudi 4050 4034 0 11:46 ? 00:00:00 postgres: postgresql: autovacuum launcher
rudi 4051 4034 0 11:46 ? 00:00:00 postgres: postgresql: logical replication launcher
rudi 4054 4034 0 11:46 ? 00:00:00 postgres: postgresql: walsender replicator 192.168.56.11(43266) streaming 0/50001A8
rudi 4055 4034 0 11:46 ? 00:00:00 postgres: postgresql: walsender replicator 192.168.56.12(47174) streaming 0/50001A8
7.5 查看集群信息, postgresql1是主库,正常工作
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 3 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 3 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 3 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
8 手工切换(switchover)
8.1 切换之前的状态信息
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 3 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 3 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 3 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
8.2 执行手工切换(switchover)
- 当前的主(Primary)是: postgresql1/host1
- 选择新的主(Primary): postgresql3/host3
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 switchover postgresql
Master [postgresql1]:
Candidate ['postgresql2', 'postgresql3'] []: postgresql3
When should the switchover take place (e.g. 2019-09-17T12:53 ) [now]:
Current cluster topology
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 3 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 3 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 3 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
Are you sure you want to switchover cluster postgresql, demoting current master postgresql1? [y/N]: y
2019-09-17 11:53:12.19439 Successfully switched over to "postgresql3"
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | | stopped | | unknown |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 3 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | Leader | running | 3 | |
+------------+---------+---------------------+--------+---------+----+-----------+
8.3 持续查看集群状态信息:
- 新的主库(Primary)是:postgresql3/host3
- Patroni重新启动了postgresql1/host1
- 最后, postgresql1/host1 作为从库(Standby)重新加入集群,正常工作
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | | stopped | | unknown |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 4 | |
| postgresql | postgresql3 | 192.168.56.12:15432 | Leader | running | 4 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | | running | 4 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 4 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | Leader | running | 4 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
9 重启动主机,主库(Primay)所在的主机
9.1 重启动之前的集群信息
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | | running | 4 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 4 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | Leader | running | 4 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
9.2 重启动host3(Primary database)
[root@host3 ~]# reboot
Connection to host3 closed by remote host.
Connection to host3 closed.
9.3 查看集群状态信息
- postgresql3/host3停止了
- postgresql1/host1成为了主库(Primary)
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 5 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 5 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | stopped | | unknown |
+------------+---------+---------------------+--------+---------+----+-----------+
9.4 当host3启动后,手工启动etcd3,手工启动postgresql3
[rudi@host3 ~]source ./scripts/etcd3.sh
[rudi@host3 ~]patroni ./scripts/postgresql3.yml
9.5 当etcd3/postgresql3启动后,查看集群状态信息
- postgresql3/host3成为从库(Standby),正常工作
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 5 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 5 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 5 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
10 重启动从库(Standby)主机
10.1 重启之前的集群信息
- 从库:postgresql2,postgresql3
- 主库:postgresql1
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 5 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 5 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 5 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
10.2 重启动host2(Standby)
[root@host2 ~]# reboot
Connection to host2 closed by remote host.
Connection to host2 closed.
10.3 查看集群信息
- postgresql2 已经停止
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 5 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | stopped | | unknown |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 5 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+
10.4 当host2启动完毕后, 按照先后顺序,手动启动etcd2和postgresql2
[rudi@host2 ~]source ./scripts/etcd2.sh
[rudi@host2 ~]patroni ./scripts/postgresql2.yml
10.5 当etcd2和postgresql2启动完毕后,查看集群状态
- postgresql2仍然是从库,正常工作
[rudi@host1 ~]$ patronictl -d etcd://host1:2379 list postgresql
+------------+---------+---------------------+--------+---------+----+-----------+
| Cluster | Member | Host | Role | State | TL | Lag in MB |
+------------+---------+---------------------+--------+---------+----+-----------+
| postgresql | postgresql1 | 192.168.56.10:15432 | Leader | running | 5 | 0 |
| postgresql | postgresql2 | 192.168.56.11:15432 | | running | 5 | 0 |
| postgresql | postgresql3 | 192.168.56.12:15432 | | running | 5 | 0 |
+------------+---------+---------------------+--------+---------+----+-----------+