背景
由于历史原因,公司的缓存方案使用的是Codis,并且一个大部门公用一个集群,我们计划废弃Codis,用Redis原生的集群架构,为什么要废弃Codis呢,主要有两个原因:1、Codis官方已经很久没有更新维护了,Redis官方版本已经迭代到5.x.x了,codis-server还是3.x.x,Redis的一些新特性无法支持;2、基于风险均摊、鸡蛋不放一个篮子的原则,目前我们这样的用法违背了这一原则,如果一个集群出问题,那么整个部门的全部服务都受影响。在前期和业务部门调研的过程中发现,大家用Codis不仅仅是做缓存,有些业务场景还当储存用,比如计数器等;所以我们需要一个数据实时迁移方案,这样业务才能无感知的从Codis迁移到Redis。
方案选型
需求
1、支持从Codis到Redis Cluster做数据迁移
2、支持从Codis到哨兵集群做数据迁移
3、支持只迁移部分key
4、支持查看迁移进度
调研
1、redis-migrate-tool
redis-migrate-tool是唯品会开源的一款Redis异构集群之间的数据实时迁移工具,不过已经有两年没有更新了,我个人觉得这是一款比较完善的工具,特别是数据校验,详细功能介绍见GitHub:
https://github.com/vipshop/redis-migrate-tool
2、RedisShake
RedisShake是阿里云基于豌豆荚开源的redis-port进行二次开发的一个支持Redis异构集群实时同步的工具,它和redis-migrate-tool相比较,我觉得它的优点在于支持前缀key的同步,支持多DB同步,而redis-migrate-tool 只能全量同步,并且如果源做了分库,同步到目标Redis的时候都同步到了db0一个库里面了,这对于做了分库场景的业务是不可行的,关于RedisShake的详细功能介绍见GitHub:
https://github.com/alibaba/RedisShake
3、redis-port
redis-port是豌豆荚当年为了让大家方便从redis迁移到Codis开源的一个Redis数据迁移工具,现在也已经很久没更新了,关于它的功能也用法见GitHub:
https://github.com/CodisLabs/redis-port
实践
环境
codis---》哨兵
分片master | 密码 | codis版本 | 哨兵地址 | master地址 | master密码 | 哨兵redis版本 |
---|---|---|---|---|---|---|
192.168.46.150:10379 | xxx | 3.2.4 | 192.168.9.87:6385 | 192.168.9.87:6384 | 123456 | 5.0.2 |
192.168.47.150:10379 | xxx | 3.2.4 | 192.168.9.88:6385 | 192.168.9.87:6384 | 123456 | 5.0.2 |
xxx | 3.2.4 | 192.168.9.89:6385 | 192.168.9.87:6384 | 123456 | 5.0.2 |
codis---》Redis Cluster
分片master | 密码 | codis版本 | master node | master密码 | redis cluster版本 |
---|---|---|---|---|---|
192.168.46.150:10379 | xxx | 3.2.4 | 192.168.9.87:6383 | 123456 | 5.0.2 |
192.168.47.150:10379 | xxx | 3.2.4 | 192.168.9.89:6382 | 123456 | 5.0.2 |
xxx | 3.2.4 | 192.168.9.88:6381 | 123456 | 5.0.2 |
使用redis-migrate-tool进行数据迁移
迁移工具安装
按官方文档进行编译安装即可
编写配置文件
迁移哨兵的配置文件
vim /chj/app/redis-migrate-tool/rmt_sentinel.conf
[source]
type: single
servers :
- 192.168.46.150:10379
- 192.168.47.150:10379
redis_auth: xxx
[target]
type: single
servers:
- 192.168.9.87:6384
redis_auth: 123456
[common]
listen: 0.0.0.0:8888
迁移redis cluster的配置文件
vim /chj/app/redis-migrate-tool/rmt_cluster.conf
[source]
type: single
servers :
- 192.168.46.150:10379
- 192.168.47.150:10379
redis_auth: xxx
[target]
type: redis cluster
servers:
- 192.168.9.87:6383
- 192.168.9.89:6382
- 192.168.9.88:6381
redis_auth: 123456
[common]
listen: 0.0.0.0:8889
启动同步程序
cd /chj/app/redis-migrate-tool
#condis迁移数据到哨兵集群
./src/redis-migrate-tool -c rmt_sentinel.conf -o rmt.log -d
#condis迁移数据到redis cluster
./src/redis-migrate-tool -c rmt_cluster.conf -o rmt_cluster.log -d
数据校验
cd /chj/app/redis-migrate-tool
[root@devops-template-test redis-migrate-tool]# ./src/redis-migrate-tool -c rmt_sentinel.conf -C "redis_check 60000"
Check job is running...
[2019-06-25 11:12:09.414] rmt_check.c:848 ERROR: key checked failed: check key's value error, value is inconsistent. key(len:17, type:hash): BigData-IpParse:4
Checked keys: 60000
Inconsistent value keys: 1
Inconsistent expire keys : 0
Other check error keys: 0
Checked OK keys: 59999
Check job finished, used 16.622s
PS
1、"-C "redis_check 60000" 指定要执行数据校验,60000指的是校验数据的样本数,默认是1000
2、如果有异常,需要确认执行异常key的情况
同步状态确认
total_msgs_outqueue可以判断是否有oplog在队列中等待处理,如果total_msgs_outqueue>0,请继续等待,直到total_msgs_outqueue=0才能切换
[root@devops-template-test redis-migrate-tool]# redis-cli -h 127.0.0.1 -p 8889 info
Server
version:0.1.0
os:Linux 3.10.0-693.5.2.el7.x86_64 x86_64
multiplexing_api:epoll
gcc_version:4.8.5
process_id:10137
tcp_port:8889
uptime_in_seconds:1201
uptime_in_days:0
config_file:/chj/app/redis-migrate-tool/rmt_cluster.conf
Clients
connected_clients:1
max_clients_limit:100
total_connections_received:1
Memory
mem_allocator:jemalloc-0.0.0
Group
source_nodes_count:2
target_nodes_count:4
Stats
all_rdb_received:1
all_rdb_parsed:1
all_aof_loaded:0
rdb_received_count:2
rdb_parsed_count:2
aof_loaded_count:0
total_msgs_recv:357666
total_msgs_sent:357666
total_net_input_bytes:78804395
total_net_output_bytes:1688068278
total_net_input_bytes_human:75.15M
total_net_output_bytes_human:1.57G
total_mbufs_inqueue:0
total_msgs_outqueue:0
使用RedisShake进行数据迁移
工具安装
mkdir /chj/app/redis-shake
cd /chj/app/redis-shake
wget https://github.com/alibaba/RedisShake/releases/download/release-v1.6.9-20190624/redis-shake.tar.gz
tar -zxvf redis-shake.tar.gz
编写配置文件
在原来的配置文件上修改,只修改下面有注释的项,其他保持不变
id = redis-shake
log.file = ./redis-shake.log
log.level = info
pid_path =
system_profile = 9310
http_profile = 9320
ncpu = 0
parallel = 32
source.type = cluster #源类型选择cluster
source.address = 192.168.46.150:10379;192.168.47.150:10379 #codis 分片master的地址
source.password_raw = xxx #codis的密码
source.auth_type = auth
source.tls_enable = false
target.type = sentinel #目标的类型是哨兵
#target.type = cluster #目标是redis cluster
target.address = sentinel-zhj2-redis-sentinel-dev-6384@192.168.9.87:6385;192.168.9.88:6385;192.168.9.89:6385 #目标哨兵集群的地址
#target.address = 192.168.9.87:6383;192.168.9.89:6382;192.168.9.88:6381 #目标redis cluster的地址
target.password_raw = 123456 #目标redis的密码
target.auth_type = auth
target.db = -1
target.tls_enable = false
rdb.input = local
rdb.output = local_dump
rdb.parallel = 0
rdb.special_cloud =
fake_time =
rewrite = true
filter.db = 0 #只同步db0
filter.key =mms;vcc #只同步mms和vcc开头的key
filter.slot =
filter.lua = false
big_key_threshold = 524288000
psync = false
metric = true
metric.print_log = false
heartbeat.url =
heartbeat.interval = 3
heartbeat.external = test external
heartbeat.network_interface =
sender.size = 104857600
sender.count = 5000
sender.delay_channel_size = 65535
keep_alive = 0
scan.key_number = 50
scan.special_cloud =
scan.key_file =
qps = 200000
replace_hash_tag = false
extra = false
启动同步程序
/chj/app/redis-shake/start.sh /chj/app/redis-shake/redis-shake.conf sync
查看同步状态
通过比较PullCommandTotal - BypassCommandTotal == PushCommandTotal 确定同步是否完成
curl http://192.168.47.253:9320/metric| python -m json.tool
[
{
"AvgDelay": "0.43 ms",
"BypassCmdCount": 0,
"BypassCmdCountTotal": 0,
"Delay": "null ms",
"Details": null,
"FailCmdCount": 0,
"FailCmdCountTotal": 0,
"FullSyncProgress": 100,
"NetworkFlowTotal": 42006,
"NetworkSpeed": 0,
"ProcessingCmdCount": 0,
"PullCmdCount": 0,
"PullCmdCountTotal": 897,
"PushCmdCount": 0,
"PushCmdCountTotal": 839,
"SenderBufCount": 0,
"SourceAddress": "192.168.46.150:10379",
"SourceDBOffset": 0,
"StartTime": "2019-06-25T17:45:23Z",
"Status": "incr",
"SuccessCmdCount": 0,
"SuccessCmdCountTotal": 839,
"TargetAddress": [
"192.168.9.87:6384"
],
"TargetDBOffset": 0
},
{
"AvgDelay": "0.60 ms",
"BypassCmdCount": 1,
"BypassCmdCountTotal": 4067,
"Delay": "null ms",
"Details": null,
"FailCmdCount": 0,
"FailCmdCountTotal": 0,
"FullSyncProgress": 100,
"NetworkFlowTotal": 37629,
"NetworkSpeed": 0,
"ProcessingCmdCount": 0,
"PullCmdCount": 1,
"PullCmdCountTotal": 5106,
"PushCmdCount": 0,
"PushCmdCountTotal": 333,
"SenderBufCount": 0,
"SourceAddress": "192.168.47.150:10379",
"SourceDBOffset": 0,
"StartTime": "2019-06-25T17:45:23Z",
"Status": "incr",
"SuccessCmdCount": 0,
"SuccessCmdCountTotal": 333,
"TargetAddress": [
"192.168.9.87:6384"
],
"TargetDBOffset": 0
}
]
来源:51CTO
作者:navyaijm2012
链接:https://blog.51cto.com/navyaijm/2417811?source=dra