Centos7安装zookeeper+kafka集群
1 Zookeeper和kafka简介
1) ZooKeeper
是一个分布式的、分层级的文件系统,能促进客户端间的松耦合,并提供最终一致的,用于管理、协调Kafka代理,zookeeper集群中一台服务器作为Leader,其它作为Follower
2) Apache Kafka
是分布式发布-订阅消息系统,kafka对消息保存时根据Topic进行归类,每个topic将被分成多个partition(区),每条消息在文件中的位置称为offset(偏移量),offset为一个long型数字,它是唯一标记一条消息,它唯一的标记一条消息。
一个partition中的消息只会被group中的一个consumer消费,每个group中consumer消息消费互相独立,不过一个consumer可以消费多个partitions中的消息。
kafka只能保证一个partition中的消息被某个consumer消费时,消息是顺序的。从Topic角度来说,消息仍不是有序的。
每个partition都有一个server为"leader";leader负责所有的读写操作,如果leader失效,那么将会有其他follower来接管成为新的leader(有zookeeper选举);follower只是单调的和leader跟进,同步消息即可..由此可见作为leader的server承载了全部的请求压力,因此从集群的整体考虑,有多少个partitions就意味着有多少个"leader",kafka会将"leader"均衡的分散在每个实例上,来确保整体的性能稳定.
2 服务器地址
使用3台服务器搭建集群
Server1:192.168.89.11
Server2:192.168.89.12
Server3:192.168.89.13
3 安装jdk
(jdk1.8.0_102)
# yum install -y java-1.8.0-openjdk java-1.8.0-openjdk-devel
4 搭建zookeeper集群
Zookeeper集群的工作是超过半数才能对外提供服务,所以选择主机数可以是1台、3台、5台…。
3台中允许1台挂掉 ,是否可以用偶数,其实没必要。
如果有4台,那么挂掉一台还剩下三台服务器,如果再挂掉一个就不行了,记住是超过半数。
4.1 下载地址
http://mirrors.shu.edu.cn/apache/zookeeper/zookeeper-3.4.13/zookeeper-3.4.13.tar.gz
4.2 所有节点安装zookeeper
1)创建zookeeper安装目录
# mkdir -p /data/zookeeper
2)将zookeeper解压到安装目录
# tar -zxvf zookeeper-3.4.13.tar.gz -C /data/zookeeper/
3)新建保存数据的目录
# mkdir -p /data/zookeeper/zookeeper-3.4.13/data
4)新建日志目录
# mkdir -p /data/zookeeper/zookeeper-3.4.13/dataLog
5)配置环境变量并刷新
# vim /etc/profile
===================================================
export ZK_HOME=/data/zookeeper/zookeeper-3.4.13
export PATH=$PATH:$ZK_HOME/bin
===================================================
# source /etc/profile
4.3 所有节点配置zookeeper配置文件
4.3.1 各节点中配置
# cd /data/zookeeper/zookeeper-3.4.13/conf/
# cp -f zoo_sample.cfg zoo.cfg
# vim zoo.cfg
====================================================
tickTime=2000
initLimit=10
syncLimit=5
dataDir=/data/zookeeper/zookeeper-3.4.13/data/
dataLogDir=/data/zookeeper/zookeeper-3.4.13/dataLog/
clientPort=2181
server.1=192.168.89.11:2888:3888
server.2=192.168.89.12:2888:3888
server.3=192.168.89.13:2888:3888
#第一个端口是master和slave之间的通信端口,默认是2888,第二个端口是leader选举的端口,集群刚启动的时候选举或者leader挂掉之后进行新的选举的端口默认是3888
====================================================
# echo "1" > /data/zookeeper/zookeeper-3.4.13/data/myid #server1配置,各节点不同,跟上面配置server.1的号码一样
# echo "2" > /data/zookeeper/zookeeper-3.4.13/data/myid #server2配置,各节点不同,跟上面配置server.2的号码一样
# echo "3" > /data/zookeeper/zookeeper-3.4.13/data/myid #server3配置,各节点不同,跟上面配置server.3的号码一样
4.3.2 启动停止zookeeper命令并设置开机启动
1)启动停止zookeeper命令
# zkServer.sh start #启动
# zkServer.sh stop #停止
# zkCli.sh #连接集群
2)设置开机启动
# cd /usr/lib/systemd/system
# vim zookeeper.service
=========================================
[Unit]
Description=zookeeper server daemon
After=zookeeper.target
[Service]
Type=forking
ExecStart=/data/zookeeper/zookeeper-3.4.13/bin/zkServer.sh start
ExecReload=/data/zookeeper/zookeeper-3.4.13/bin/zkServer.sh stop && sleep 2 && /data/zookeeper/zookeeper-3.4.13/bin/zkServer.sh start
ExecStop=/data/zookeeper/zookeeper-3.4.13/bin/zkServer.sh stop
Restart=always
[Install]
WantedBy=multi-user.target
=======================================================
# systemctl start zookeeper
# systemctl enable zookeeper
5 搭建kafka集群
5.1 下载地址
http://mirror.bit.edu.cn/apache/kafka/2.1.0/kafka_2.12-2.1.0.tgz
5.2 所有节点上搭建kafka
1) 新建kafka工作目录
# mkdir -p /data/kafka
2) 解压kafka
# tar -zxvf kafka_2.12-2.1.0.tgz -C /data/kafka/
3) 新建kafka日志目录
# mkdir -p /data/kafka/kafkalogs
4) 配置kafka配置文件
# vim /data/kafka/kafka_2.12-2.1.0/config/server.properties
=================================================
broker.id=1 #每一个broker在集群中的唯一标示,要求是正数
listeners=PLAINTEXT://192.168.89.11:9092 # 套接字服务器连接的地址
log.dirs=/data/kafka/kafkalogs/ #kafka数据的存放地址
message.max.byte=5242880 #消息体的最大大小,单位是字节
log.cleaner.enable=true #开启日志清理
log.retention.hours=72 #segment文件保留的最长时间(小时),超时将被删除,也就是说3天之前的数据将被清理掉
log.segment.bytes=1073741824 #日志文件中每个segmeng的大小(字节),默认为1G
log.retention.check.interval.ms=300000 #定期检查segment文件有没有达到1G(单位毫秒)
num.partitions=3 #每个topic的分区个数, 更多的分区允许更大的并行操作default.replication.factor=3 # 一个topic ,默认分区的replication个数 ,不得大于集群中broker的个数
delete.topic.enable=true # 选择启用删除主题功能,默认false
replica.fetch.max.bytes=5242880 # replicas每次获取数据的最大大小
#以下三个参数设置影响消费者消费分区可以连接的kafka主机,详细请看第6点附录
offsets.topic.replication.factor=3 #Offsets topic的复制因子(备份数)
transaction.state.log.replication.factor=3 #事务主题的复制因子(设置更高以确保可用性)
transaction.state.log.min.isr=3 #覆盖事务主题的min.insync.replicas配置
zookeeper.connect=192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181
#zookeeper集群的地址,可以是多个
=================================================
5) kafka节点默认需要的内存为1G,如果需要修改内存,可以修改kafka-server-start.sh的配置项
# vim /data/kafka/kafka_2.12-2.1.0/bin/kafka-server-start.sh
#找到KAFKA_HEAP_OPTS配置项,例如修改如下:
export KAFKA_HEAP_OPTS="-Xmx2G -Xms2G"
5.3 启动kafka并设置开机启动
1)启动kafka
# cd /data/kafka/kafka_2.12-2.1.0/
./bin/kafka-server-start.sh -daemon ./config/server.properties
启动后可以执行jps命令查看kafka是否启动,如果启动失败,可以进入logs目录,查看kafkaServer.out日志记录。
2)设置开机启动
# cd /usr/lib/systemd/system
# vim kafka.service
=========================================
[Unit]
Description=kafka server daemon
After=kafka.target
[Service]
Type=forking
ExecStart=/data/kafka/kafka_2.12-2.1.0/bin/kafka-server-start.sh -daemon /data/kafka/kafka_2.12-2.1.0/config/server.properties
ExecReload=/data/kafka/kafka_2.12-2.1.0/bin/kafka-server-stop.sh && sleep 2 && /data/kafka/kafka_2.12-2.1.0/bin/kafka-server-start.sh -daemon /
data/kafka/kafka_2.12-2.1.0/config/server.properties
ExecStop=/data/kafka/kafka_2.12-2.1.0/bin/kafka-server-stop.sh
Restart=always
[Install]
WantedBy=multi-user.target
=======================================================
# systemctl start kafka
# systemctl enable kafka
5.4 创建topic
创建3分区、3备份
# cd /data/kafka/kafka_2.12-2.1.0/
#./bin/kafka-topics.sh --create --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181 --replication-factor 3 --partitions 3 --topic SyslogTopic
5.5 常用命令
# cd /data/kafka/kafka_2.12-2.1.0/
1) 停止kafka
./bin/kafka-server-stop.sh
2) 创建topic
./bin/kafka-topics.sh --create --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181 --replication-factor 1 --partitions 1 --topic topic_name
3) 展示topic
./bin/kafka-topics.sh --list --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181
4) 查看描述topic
./bin/kafka-topics.sh --describe --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181 --topic topic_name
5) 生产者发送消息
./bin/kafka-console-producer.sh --broker-list 192.168.89.11:9092 --topic topic_name
6) 消费者消费消息
./bin/kafka-console-consumer.sh --bootstrap-server 192.168.89.11:9092,192.168.89.12:9092,192.168.89.13:9092 --topic topic_name
7) 删除topic
./bin/kafka-topics.sh --delete --topictopic_name --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181
8) 查看每分区consumer_offsets(可以连接到的消费主机)
./bin/kafka-topics.sh --describe --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181 --topic __consumer_offsets
6 附录
1、 消费者消费分区数据,kafka的负载均衡
在/data/kafka/kafka_2.12-2.1.0/config/server.properties文件中修改offsets.topic.replication.factor,这个值为kafka集群的主机数量(__consumer_offest不受server.properties中num.partitions和default.replication.factor参数的制约。相反地,它的分区数和备份因子分别由offsets.topic.num.partitions和offsets.topic.replication.factor参数决定。这两个参数的默认值分别是50和1,表示该topic有50个分区,副本因子是1。),设置正确如下图:
(执行命令:bin/kafka-topics.sh --describe --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181 --topic __consumer_offsets)
注:如果配置文件offsets.topic.replication.factor设置成1后启动了一次,再设置成3重新启动副本因子不会更改,需要以下方法:
另外一种方法设置:
1) 创建规则json
cat > increase-replication-factor.json <<EOF
{"version":1, "partitions":[
{"topic":"__consumer_offsets","partition":0,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":1,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":2,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":3,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":4,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":5,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":6,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":7,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":8,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":9,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":10,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":11,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":12,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":13,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":14,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":15,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":16,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":17,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":18,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":19,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":20,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":21,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":22,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":23,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":24,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":25,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":26,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":27,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":28,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":29,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":30,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":31,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":32,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":33,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":34,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":35,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":36,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":37,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":38,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":39,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":40,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":41,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":42,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":43,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":44,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":45,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":46,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":47,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":48,"replicas":[1,2,3]},
{"topic":"__consumer_offsets","partition":49,"replicas":[1,2,3]}]
}
EOF
2) 执行
bin/kafka-reassign-partitions.sh --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181 --reassignment-json-file increase-replication-factor.json --execute
3) 验证
bin/kafka-reassign-partitions.sh --zookeeper 192.168.89.11:2181,192.168.89.12:2181,192.168.89.13:2181 --reassignment-json-file increase-replication-factor.json --verify
来源:oschina
链接:https://my.oschina.net/u/4334671/blog/3659371