1.Base基础/3.Icon图标/操作/search备份
1.Base基础/3.Icon图标/操作/search备份
EN
文档
关于AntDB
快速入门
使用教程
运维
调优
工具和插件
高级服务
数据安全
参考
  • 文档首页 /
  • 部署与升级 /
  • 分布式安装部署 /
  • 高可用容灾环境部署

高可用容灾环境部署

更新时间:2024-07-01 14:39:42

AntDB 数据库采用复制组多副本架构,保证数据库的高可用性。同时,AntDB 支持单机、机房和城市级别的容灾,提供双中心部署方案。用户可根据机房配置和容灾需求,选择合适的方案进行部署。

主中心部署

参考【分布式高可用集群模式部署】搭建一主两备高可用集群。

搭建完成,集群如下所示:

[antdb@localhost ~]$ adbhamgrctl -c /etc/adbhamgr/adbhamgr_antdbcluster.yaml list
+ Cluster: antdbcluster (7325731413001351384) --+---------+----+-----------+
| Member     | Host              | Role         | State   | TL | Lag in MB |
+------------+-------------------+--------------+---------+----+-----------+
| adbhamgr-1 | 10.21.10.175:4567 | Leader       | running |  2 |           |
| adbhamgr-2 | 10.21.10.176:4567 | Replica      | running |  2 |         0 |
| adbhamgr-3 | 10.21.10.177:4567 | Sync Standby | running |  2 |         0 |
+------------+-------------------+--------------+---------+----+-----------+

登陆主中心的 mgr 执行:

ADD hba gtmcoord all("host all all 0.0.0.0 0 trust");
ADD hba coordinator all("host all all 0.0.0.0 0 trust");
ADD hba datanode all("host all all 0.0.0.0 0 trust");

修改 postgresql.conf 文件

将主中心的 postgresql.conf 文件中 listen_addresses 字段值由默认的 localhost 改为*,或者 IP 地址列表。将 wal_level 改为 hot_standby,参考如下:

listen_addresses = '*' 
max_wal_senders = 5  
wal_keep_size = 5120 
wal_level = replica 
hot_standby = on 
log_destination = 'csvlog' 
logging_collector = on  
log_directory = 'pg_log' 
mgr_zone ='local'      #按实际主中心的zone名称来配置,默认为local

重启 mgr

mgr_ctl restart -D /home/antdb/data/mgr 

副中心部署

一键式部署

参考【分布式高可用集群模式部署】,搭建一主两备高可用集群,方法和搭建主中心一致。

例如,集群如下所示,此时主副中心是两个独立的集群。

[antdb@localhost mgr]$ adbhamgrctl -c /etc/adbhamgr/adbhamgr_antdbcluster.yaml list
+ Cluster: antdbcluster (7325731413001351384) ----+---------+----+-----------+
| Member     | Host              | Role           | State   | TL | Lag in MB |
+------------+-------------------+----------------+---------+----+-----------+
| adbhamgr-1 | 10.21.10.180:4567 | Leader         | running |  2 |           |
| adbhamgr-2 | 10.21.10.181:4567 | Replica        | running |  2 |         0 |
| adbhamgr-3 | 10.21.10.191:4567 | Sync Standby   | running |  2 |         0 |
+------------+-------------------+----------------+---------+----+-----------+

停止副中心

  • 登录副中心 mgr,停止原有服务并清除信息:
stop all;
clean all;
stop agent all;
  • 停止 adbhamgr 服务(所有节点都要执行):
sudo systemctl stop adbhamgr

使用 adbhamgr 的 remove 命令清理集群数据(一个节点执行即可):

# remove后跟的参数“antdbcluster”为集群名称
[antdb@localhost ~]$ adbhamgrctl -c /etc/adbhamgr/adbhamgr_antdbcluster.yaml remove antdbcluster
+ Cluster: antdbcluster (7326804842864827307) --+
| Member | Host | Role | State | TL | Lag in MB |
+--------+------+------+-------+----+-----------+
+--------+------+------+-------+----+-----------+
Please confirm the cluster name to remove: antdbcluster
You are about to remove all information in DCS for antdbcluster, please type: "Yes I am aware": Yes I am aware

重建副中心

步骤一:修改主中心的 hba 文件

重建副中心需要执行 adb_basebackup 命令,因此需要配置 hba,从而允许备机通过流复制连接到主节点。

  • 修改主节点的 pg_hba.conf 文件,例如:
host    replication     all             0.0.0.0/0            trust
  • reload 主节点,使配置生效;
mgr_ctl reload -D /home/antdb/data/mgr

步骤二:主中心的主节点创建复制槽(可选):

# 记录下创建的复制槽名称为adbhamgr(自定义)
select pg_create_physical_replication_slot('adbhamgr', true);

步骤三:重建副中心(每个节点都执行,操作一样)

  • 清理数据目录;
rm -rf /home/antdb/data
  • 修改 adbhamgr 的配置文件;
# 副中心adbhamgr的yml配置文件中bootstrap.dcs下添加 standby_cluster 参数块,如下:
bootstrap:
  # this section will be written into adbdcs:/<namespace>/<scope>/config after initializing new cluster
  # and all other cluster members will use it as a `global configuration`
  dcs:
    ……
    standby_cluster:               # 添加standby_cluster参数设置,注意对其格式
      host: 10.21.10.175           # 主中心某个节点ip, 执行pg_basebackup和建立流复制的远程节点
      port: 4567                   # 远程节点的端口
      primart_slot_name: adbhamgr   # 建立流复制的复制槽,如果使用,需要手动在主中心建立相应的复制槽。(步骤二创建)
      create_replica_methods:      # 建立基础备份使用的方法,一般为pg_basebackup
        - basebackup

注意:host设置的节点与primart_slot_name所在的节点要保持一致。
     这里建议设置为主中心的主节点,与上述步骤一、二都是以主中心的主节点操作保持一致。
  • 最后,启动各节点的 adbhamgr;
# 启动各节点的adbhamgr:
sudo systemctl start adbhamgr

# 成功后查看集群状态,副中心的主节点为Standby Leader,其余节点都是Replica
[antdb@localhost ~]$ adbhamgrctl -c /etc/adbhamgr/adbhamgr_antdbcluster.yaml list
+ Cluster: antdbcluster (7325731413001351384) ----+---------+----+-----------+
| Member     | Host              | Role           | State   | TL | Lag in MB |
+------------+-------------------+----------------+---------+----+-----------+
| adbhamgr-1 | 10.21.10.180:4567 | Standby Leader | running |  2 |           |
| adbhamgr-2 | 10.21.10.181:4567 | Replica        | running |  2 |         0 |
| adbhamgr-3 | 10.21.10.191:4567 | Replica        | running |  2 |         0 |
+------------+-------------------+----------------+---------+----+-----------+

# 查看副中心主节点的流复制,可以看到副中心的主节点的流复制源是主中心的主节点
[antdb@localhost ~]$ adb -d antdb -p 60103
psql (13.3)
Type "help" for help.
antdb=# \x
Expanded display is on.
antdb=# SELECT * FROM pg_stat_wal_receiver;
  pid   |  status   | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli |      last_msg_send_time       |     last_msg_receipt_
time     | latest_end_lsn |        latest_end_time        | slot_name | sender_host  | sender_port |
                                                                                               conninfo

--------+-----------+-------------------+-------------------+-------------+-------------+--------------+-------------------------------+----------------------
---------+----------------+-------------------------------+-----------+--------------+-------------+----------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
 559514 | streaming | 0/30000000        |                 1 | 0/30000148  | 0/30000148  |            1 | 2024-01-22 17:57:19.931312+08 | 2024-01-22 17:57:19.9
38772+08 | 0/30000148     | 2024-01-22 16:09:00.814624+08 | ds1c      | 10.21.10.175 |       60104 | user=antdb passfile=/home/antdb/.pgpass channel_binding=p
refer dbname=replication host=10.21.10.175 port=60104 application_name=ds1c fallback_application_name=walreceiver sslmode=prefer sslcompression=0 ssl_min_prot
ocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any contype=6655
(1 row)


# 查看副中心备节点的流复制,可以看到副中心的备节点的流复制源是副中心的主节点
antdb=# SELECT * FROM pg_stat_wal_receiver;
  pid   |  status   | receive_start_lsn | receive_start_tli | written_lsn | flushed_lsn | received_tli |      last_msg_send_time       |     last_msg_receipt_
time     | latest_end_lsn |        latest_end_time        | slot_name | sender_host  | sender_port |
                                                                                               conninfo

--------+-----------+-------------------+-------------------+-------------+-------------+--------------+-------------------------------+----------------------
---------+----------------+-------------------------------+-----------+--------------+-------------+----------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------
 982561 | streaming | 0/30000000        |                 1 | 0/30000148  | 0/30000148  |            1 | 2024-01-22 17:59:50.596492+08 | 2024-01-22 17:59:50.6
02242+08 | 0/30000148     | 2024-01-22 16:09:01.068282+08 | ds2c      | 10.21.10.176 |       60105 | user=antdb passfile=/home/antdb/.pgpass channel_binding=p
refer dbname=replication host=10.21.10.176 port=60105 application_name=ds2c fallback_application_name=walreceiver sslmode=prefer sslcompression=0 ssl_min_prot
ocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres target_session_attrs=any contype=6655
(1 row)

步骤四:修改 postgresql.conf 文件(所有节点都要执行)

修改副中心 mgr 中 postgresql.conf 文件中的 mgr_zone 名称,将其改为 “zone2”,即副中心的 zone 名称。

mgr_zone = 'zone2' #根据实际需要进行设置

重启副中心 mgr:

mgr_ctl restart -D /home/antdb/data/mgr 

部署副中心节点

添加和部署副中心主机

注意:下面的操作只能在主中心 mgr 上执行

# 登陆主机的mgr执行如下添加命令:
# 如下,追加了三个host
# 各参数的值,请根据部署环境进行更改
ADD host host01(port=22,protocol='ssh',adbhome='/home/antdb/app',address="10.21.10.180",agentport=8432,user='antdb');
ADD host host02(port=22,protocol='ssh',adbhome='/home/antdb/app',address="10.21.10.181",agentport=8432,user='antdb');
ADD host host03(port=22,protocol='ssh',adbhome='/home/antdb/app',address="10.21.10.191",agentport=8432,user='antdb');

start agent all password 'XXX';

添加副中心节点

注意:下面的操作只能在主中心 mgr 上执行

注意:主中心的所有 master 节点必须在副中心有一个直属的 slave 节点,slave 节点可以有级联 slave 节点。

--例如:主中心有gtmcoord master gc; coordinator master coord0和coord1; datanode master dm0和dm1
--那么副中心至少要为以上master节点部署一个对应的备节点
--副中心的节点的zone应该填写上述步骤中,postgresql.conf文件中设置的副中心的zone
ADD gtmcoord slave gc_4 for gc_1(host='host02',port=60103, path='/home/antdb/data/gc_4',zone='zone2');
ADD datanode slave ds1c for dn1(host='host01',port=60104, path='/home/antdb/data/ds1a',zone='zone2');
ADD datanode slave ds2c for dn2(host='host02',port=60104, path='/home/antdb/data/ds2c',zone='zone2');
ADD datanode slave ds3c for dn3(host='host03',port=60104, path='/home/antdb/data/ds3c',zone='zone2');
ADD coordinator slave cna for cn1(path = '/home/antdb/data/cna', host='host01', port=7788,zone='zone2');
ADD coordinator slave cnb for cn2(path = '/home/antdb/data/cnb', host='host02', port=7788,zone='zone2');
ADD coordinator slave cnc for cn3(path = '/home/antdb/data/cnc', host='host03', port=7788,zone='zone2');

查看属于指定副中心的所有节点:

LIST node zone zone_name; 

例如:

antdb=# LIST NODE ZONE zone2;
 name |  host  |       type        | mastername | port  | sync_state |          path          | initialized | incluster | zone
------+--------+-------------------+------------+-------+------------+------------------------+-------------+-----------+-------
 gc_4 | host01 | gtmcoord slave    | gc_1       | 60103 | async      | /home/antdb/data1/gc_4 | f           | f         | zone2
 cna  | host01 | coordinator slave | cn1        |  7788 | async      | /home/antdb/data1/cna  | f           | f         | zone2
 cnb  | host02 | coordinator slave | cn2        |  7788 | async      | /home/antdb/data1/cnb  | f           | f         | zone2
 cnc  | host03 | coordinator slave | cn3        |  7788 | async      | /home/antdb/data1/cnc  | f           | f         | zone2
 ds1c | host01 | datanode slave    | dn1        | 60104 | async      | /home/antdb/data1/ds1a | f           | f         | zone2
 ds2c | host02 | datanode slave    | dn2        | 60104 | async      | /home/antdb/data1/ds2c | f           | f         | zone2
 ds3c | host03 | datanode slave    | dn3        | 60104 | async      | /home/antdb/data1/ds3c | f           | f         | zone2
(7 rows)

副中心节点初始化

方法一:可以使用 append 的方式逐个初始化副中心的节点。例如:

append gtmcoord slave gc_4; 
append gtmcoord slave ds1c;
append datanode slave ds2c; 
append datanode slave ds3c;
append datanode slave cna;
append datanode slave cnb;
append datanode slave cnc;

方法二:可以使用 zone init 命令一次添加 zone2 中的所有节点 ,例如:

ZONE INIT zone2; 

查看检查初始化后副中心节点的状态:

LIST node zone zone_name; 

例如:

antdb=# LIST NODE ZONE zone2;
 name |  host  |       type        | mastername | port  | sync_state |          path          | initialized | incluster | zone
------+--------+-------------------+------------+-------+------------+------------------------+-------------+-----------+-------
 gc_4 | host01 | gtmcoord slave    | gc_1       | 60103 | async      | /home/antdb/data1/gc_4 | t           | t         | zone2
 cna  | host01 | coordinator slave | cn1        |  7788 | async      | /home/antdb/data1/cna  | t           | t         | zone2
 cnb  | host02 | coordinator slave | cn2        |  7788 | async      | /home/antdb/data1/cnb  | t           | t         | zone2
 cnc  | host03 | coordinator slave | cn3        |  7788 | async      | /home/antdb/data1/cnc  | t           | t         | zone2
 ds1c | host01 | datanode slave    | dn1        | 60104 | async      | /home/antdb/data1/ds1a | t           | t         | zone2
 ds2c | host02 | datanode slave    | dn2        | 60104 | async      | /home/antdb/data1/ds2c | t           | t         | zone2
 ds3c | host03 | datanode slave    | dn3        | 60104 | async      | /home/antdb/data1/ds3c | t           | t         | zone2
(7 rows)

此时,副中心建立完成。

问题反馈