kafka压测之producer

≯℡__Kan透↙ 提交于 2020-04-05 15:50:12

背景

前不久自建了大数据平台,由于时间问题,排期紧张,未能对平台进行压测。现在平台搭建完成,计划对平台组件逐一进行一次压测。

欢迎指正,不喜勿喷!

压测目标

测试Kafka集群写入消息和消费消息的能力,根据测试结果评估当前Kafka集群模式的负载能力。

测试包括对Kafka写入消息和消费消息进行压力测试,根据不同量级的消息处理结果

测试方法

在服务器上使用kafka自带的测试脚本,模拟不同量级消息写入及读取请求,查看Kafka处理不同数量级的消息数时的处理能力,包括每秒生成消息数、吞吐量、消息延迟时间。

环境概况

系统环境

系统 版本 其他
centos 7.6 8C 32G
kafka 版本2.11-2.4.0 5台

测试环境

测试数据量:1亿条。

topic batch-size ack message-size(bytes) compression-codec partition replication throughput
test_producer 10000 1 512 none 4 3 30000
test_producer 20000 1 512 none 4 3 30000
test_producer 40000 1 512 none 4 3 30000
test_producer 60000 1 512 none 4 3 30000
test_producer 80000 1 512 none 4 3 30000

环境准备

根据当前kafka配置创建topic

kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_kafka --partitions 4 --replication-factor 3

压测工具

使用kafka自带测试工具:kafka-producer-perf-test.sh

参数说明

参数 含义
--topic topic名称
--num-records 生产消息条数
--payload-delimiter payload分隔符,默认为 \n
--throughput 处理消息的最大吞吐量,条/秒。设置为 -1 解除限制
--producer-props 生产者相关配置,如bootstrap.servers,client.id
--producer.config 生产者配置文件(同上,二选一)
--print-metrics 在操作结束后打印metrics,默认为false
--transactional-id 每次传输的最长时间。过期后将调用commitTransaction,仅当该值为正时,才启用事务。 (默认值:0)
--record-size 每条消息的大小,单位bytes(该配置与--payload-file二选一,必填)

--record-size :可以拉取一条消息来查看大小,经统计,1条消息大小大概 473 bytes,所以在此设置为 512。

>>> import sys
>>> s = 'please answer my question'
>>> sys.getsizeof(s)
58  (单位:字节)

Producer参数说明

我们在producer涉及到性能的关键因素可能会存在如下几个:

  • thread:我们测试时的单机线程数;
  • batch-size:我们所处理的数据批次大小;
  • ack:主从同步策略我们在生产消息时特别需要注意,是follower收到后返回还是只是leader收到后返回,这对于我们的吞吐量影响颇大;
  • message-size:单条消息的大小,要在producer和broker中设置一个阈值,且它的大小范围对吞吐量也有影响;
  • compression-codec:压缩方式,目前我们有不压缩,gzip,snappy,lz4四种方式;
  • partition:分区数,主要是和线程复合来测试;
  • replication:副本数;
  • througout:我们所需要的吞吐量,单位时间内处理消息的数量,可能对我们处理消息的延迟有影响;
  • linger.ms:两次发送时间间隔,满足后刷一次数据。

Broker参数说明

  • num.replica.fetchers:副本抓取的相应参数,如果发生ISR频繁进出的情况或follower无法追上leader的情况则适当增加该值,但通常不要超过CPU核数+1;
  • num.io.threads:broker处理磁盘IO的线程数,主要进行磁盘io操作,高峰期可能有些io等待,因此配置需要大些。建议配置线程数量为cpu核数2倍,最大不超过3倍;
  • num.network.threads:broker处理消息的最大线程数,和我们生产消费的thread很类似主要处理网络io,读写缓冲区数据,基本没有io等待,建议配置线程数量为cpu核数加1;
  • log.flush.interval.messages:每当producer写入多少条消息时,刷数据到磁盘;
  • log.flush.interval.ms:每隔多长时间,刷数据到磁盘;

producer压测

1. bath-size测试
  • 测试脚本

    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 4096  --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=10000   --throughput 30000
    
    100000000 records sent, 29999.895000 records/sec (14.65 MB/sec), 5.69 ms avg latency, 522.00 ms max latency, 1 ms 50th, 2 ms 95th, 208 ms 99th, 349 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 4096  --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=20000   --throughput 30000
    
    100000000 records sent, 29999.895000 records/sec (14.65 MB/sec), 6.44 ms avg latency, 637.00 ms max latency, 1 ms 50th, 2 ms 95th, 228 ms 99th, 353 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 4096  --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=40000   --throughput 30000
    
    100000000 records sent, 29999.868001 records/sec (14.65 MB/sec), 8.12 ms avg latency, 489.00 ms max latency, 1 ms 50th, 4 ms 95th, 252 ms 99th, 354 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 4096  --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=60000   --throughput 30000
    
    100000000 records sent, 29999.877001 records/sec (14.65 MB/sec), 9.03 ms avg latency, 630.00 ms max latency, 1 ms 50th, 13 ms 95th, 261 ms 99th, 357 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 4096  --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=80000   --throughput 30000
    
    100000000 records sent, 29999.904000 records/sec (14.65 MB/sec), 9.84 ms avg latency, 531.00 ms max latency, 1 ms 50th, 34 ms 95th, 267 ms 99th, 355 ms 99.9th.
    
  • 测试结果

    batch-size ack message-size(bytes) compression-codec partition replication throughput MB/s MsgNum/s avg latency(ms)
    10000 1 512 none 4 3 30000 14.65 29999 5.69
    20000 1 512 none 4 3 30000 14.65 29999 6.44
    40000 1 512 none 4 3 30000 14.65 29999 8.12
    60000 1 512 none 4 3 30000 14.65 29999 9.03
    80000 1 512 none 4 3 30000 14.65 29999 9.84
  • 服务器负载

  • 测试结论 测试中通过我们增加batch-size的大小,我们可以发现在消息未压缩的前提下,吞吐稳定在30000条/s,而数据量在14.65M/s,平均延迟时间虽有增长,但都在10ms以内。服务器CPU在batch-size为10000时使用率5%-20%,batch-size到达20000后维持在5%-15%。

1. ack测试
  • 测试脚本

    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 512 --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092 batch.size=20000 acks=0 --throughput 30000
    
    100000000 records sent, 29999.877001 records/sec (14.65 MB/sec), 3.47 ms avg latency, 456.00 ms max latency, 0 ms 50th, 1 ms 95th, 150 ms 99th, 278 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 512 --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092 batch.size=20000 acks=1 --throughput 30000
    
    100000000 records sent, 29999.886000 records/sec (14.65 MB/sec), 6.48 ms avg latency, 488.00 ms max latency, 1 ms 50th, 2 ms 95th, 226 ms 99th, 349 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 512 --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092 batch.size=20000 acks=-1 --throughput 30000
    
    100000000 records sent, 29999.886000 records/sec (14.65 MB/sec), 35.50 ms avg latency, 939.00 ms max latency, 2 ms 50th, 308 ms 95th, 631 ms 99th, 763 ms 99.9th.
    
  • 测试结果

    batch-size ack message-size(bytes) compression-codec partition replication throughput MB/s MsgNum/s avg latency(ms)
    20000 0 512 none 4 3 30000 14.65 29999 3.47
    20000 1 512 none 4 3 30000 14.65 29999 6.48
    20000 -1 512 none 4 3 30000 14.65 29999 35
  • 服务器负载

  • 测试结论

    在当前配置下,不同的ack策略,在消息未压缩的前提下,ack=0时效率最高,安全性最低;ack=-1(默认)时效率最低,安全性最高;相比之下ack=1时安全性和性能都较高。kafka ack机制简述

3. message-size测试
  • 测试脚本

    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 512 --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092 batch.size=20000 acks=-1 --throughput 30000
    
    100000000 records sent, 29999.886000 records/sec (14.65 MB/sec), 34.43 ms avg latency, 913.00 ms max latency, 2 ms 50th, 298 ms 95th, 623 ms 99th, 755 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka --num-records 100000000 --record-size 386 --producer-props   bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092 batch.size=20000 acks=-1 --throughput 30000
    
    100000000 records sent, 29999.913000 records/sec (11.04 MB/sec), 28.42 ms avg latency, 802.00 ms max latency, 2 ms 50th, 252 ms 95th, 527 ms 99th, 631 ms 99.9th.
    
  • 测试结果

    batch-size ack message-size(bytes) compression-codec partition replication throughput MB/s MsgNum/s avg latency(ms)
    20000 -1 512 none 4 3 30000 14.65 29999 34.43
    20000 -1 386 none 4 3 30000 11.04 29999 28.42
  • 服务器负载

  • 测试结论 消息体大小相差128bytes的情况下,latency相差6ms,服务器负载差异不大。

4. partition
  • 测试脚本:

    # 1、创建topic
    $ kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_kafka_perf1 --partitions 1 --replication-factor 1
    
    $ kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_kafka_perf3 --partitions 3 --replication-factor 1
    
    $ kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_kafka_perf5 --partitions 5 --replication-factor 1
    
    $ kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_kafka_perf7 --partitions 7 --replication-factor 1
    
    $ kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_kafka_perf9 --partitions 9 --replication-factor 1
    
    # 2、生产数据
    $ kafka-producer-perf-test.sh  --topic test_kafka_perf1 --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=60000 acks=1 --throughput 60000
    
    100000000 records sent, 59989.261922 records/sec (29.29 MB/sec), 47.66 ms avg latency, 616.00 ms max latency, 1 ms 50th, 290 ms 95th, 349 ms 99th, 444 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka_perf3 --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=60000 acks=1 --throughput 60000
    
    100000000 records sent, 59992.105039 records/sec (29.29 MB/sec), 36.14 ms avg latency, 632.00 ms max latency, 1 ms 50th, 285 ms 95th, 454 ms 99th, 529 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka_perf5 --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=60000 acks=1 --throughput 60000
    
    100000000 records sent, 59994.408521 records/sec (29.29 MB/sec), 15.82 ms avg latency, 573.00 ms max latency, 1 ms 50th, 140 ms 95th, 324 ms 99th, 397 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka_perf7 --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=60000 acks=1 --throughput 60000
    
    100000000 records sent, 59994.264548 records/sec (29.29 MB/sec), 16.00 ms avg latency, 731.00 ms max latency, 1 ms 50th, 139 ms 95th, 323 ms 99th, 417 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka_perf9 --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=60000 acks=1 --throughput 60000
    
    100000000 records sent, 59999.448005 records/sec (29.30 MB/sec), 16.90 ms avg latency, 870.00 ms max latency, 1 ms 50th, 143 ms 95th, 336 ms 99th, 552 ms 99.9th.
    
  • 测试结果

    batch-size ack message-size(bytes) compression-codec partition replication throughput MB/s MsgNum/s avg latency(ms)
    60000 1 512 none 1 1 60000 29.29 59989 47.66
    60000 1 512 none 3 1 60000 29.29 59992 36.14
    60000 1 512 none 5 1 60000 29.29 59994 15.82
    60000 1 512 none 7 1 60000 29.29 59994 16.00
    60000 1 512 none 9 1 60000 29.30 59999 16.90
  • 服务器负载

  • 测试结论

    broker为5,当partition数量等于broker数时,吞吐达到最优,后续随partition数量增加,基本稳定。

5. replication
  • 测试脚本

    1、创建topic
    $ kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_kafka_perf11 --partitions 4 --replication-factor 1
    
    $ kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_kafka_perf22 --partitions 4 --replication-factor 2
    
    $ kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_kafka_perf33 --partitions 4 --replication-factor 3
    
    2、生产数据
    $ kafka-producer-perf-test.sh  --topic test_kafka_perf11 --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=60000 acks=1 --throughput 60000
    
    100000000 records sent, 59999.556003 records/sec (29.30 MB/sec), 22.97 ms avg latency, 649.00 ms max latency, 1 ms 50th, 213 ms 95th, 359 ms 99th, 416 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka_perf22 --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=60000 acks=1 --throughput 60000
    
    100000000 records sent, 59999.664002 records/sec (29.30 MB/sec), 20.35 ms avg latency, 680.00 ms max latency, 1 ms 50th, 187 ms 95th, 338 ms 99th, 416 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_kafka_perf33 --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=60000 acks=1 --throughput 60000
    
    100000000 records sent, 59999.628002 records/sec (29.30 MB/sec), 24.17 ms avg latency, 651.00 ms max latency, 1 ms 50th, 214 ms 95th, 392 ms 99th, 525 ms 99.9th.
    
  • 测试结果

    batch-size ack message-size(bytes) compression-codec partition replication throughput MB/s MsgNum/s avg latency(ms)
    80000 1 512 none 4 1 30000 29.30 59999 22.97
    80000 1 512 none 4 2 30000 29.30 59999 20.35
    80000 1 512 none 4 3 30000 29.30 59999 24.17
  • 服务器负载

  • 测试结论 Replication是我们对不同partition所做的副本,一般建议在2~4为宜,我们设置为3个,既能保障数据的高可用,又避免了浪费过多的存储资源。

6. throughout
  • 测试脚本

    # 创建topic
    $ kafka-topics.sh --create --zookeeper tvm11:2181,tvm12:2181,tvm13:2181 --topic test_throughout --partitions 4 --replication-factor 3
    
    # 生产数据
    $ kafka-producer-perf-test.sh  --topic test_throughout --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=1000000 acks=1 --throughput 100000
    
    100000000 records sent, 99998.700017 records/sec (48.83 MB/sec), 78.71 ms avg latency, 614.00 ms max latency, 1 ms 50th, 337 ms 95th, 390 ms 99th, 483 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_throughout --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=1000000 acks=1 --throughput 200000
    
    100000000 records sent, 199994.800135 records/sec (97.65 MB/sec), 171.74 ms avg latency, 771.00 ms max latency, 152 ms 50th, 417 ms 95th, 506 ms 99th, 602 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_throughout --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=1000000 acks=1 --throughput 400000
    
    100000000 records sent, 371560.740892 records/sec (181.43 MB/sec), 160.21 ms avg latency, 684.00 ms max latency, 115 ms 50th, 443 ms 95th, 544 ms 99th, 597 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_throughout --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=1000000 acks=1 --throughput 600000
    
    100000000 records sent, 370573.499548 records/sec (180.94 MB/sec), 160.82 ms avg latency, 743.00 ms max latency, 117 ms 50th, 429 ms 95th, 530 ms 99th, 581 ms 99.9th.
    
    
    $ kafka-producer-perf-test.sh  --topic test_throughout --num-records 100000000 --record-size 512 --producer-props bootstrap.servers=tvm11:9092,tvm12:9092,tvm13:9092,tvm14:9092,tvm15:9092  batch.size=1000000 acks=1 --throughput 800000
    
    100000000 records sent, 365582.592419 records/sec (178.51 MB/sec), 163.78 ms avg latency, 665.00 ms max latency, 123 ms 50th, 435 ms 95th, 540 ms 99th, 616 ms 99.9th.
    
  • 测试结果

    batch-size ack message-size(bytes) compression-codec partition replication throughput MB/s MsgNum/s avg latency(ms)
    1000000 1 512 none 4 3 100000 48.83 99998 78.71
    1000000 1 512 none 4 3 200000 97.65 199994 171.74
    1000000 1 512 none 4 3 400000 181.43 371560 160.21
    1000000 1 512 none 4 3 600000 180.94 370573 160.82
    1000000 1 512 none 4 3 800000 178.51 365582 163.78
  • 服务器负载

  • 测试结论 在partition=4,replicator=3时,并发40w以下时,随着并发数增大,吞吐上升,但是在40w以后时,可以看出并发增大反而吞吐降低了,这是因为IO的限制,在高并发的情况下,产生了阻塞而导致。

结论

producer,在主从同步选取ack=1时性能和稳定性适中,批次大小我们可以选择100w左右,数据大小保持2k/条(测试数据512bytes/条),并发可达90w;分区数在3-5个,副本数为3个既可以保证性能也能维持高可用。

参考

Flink-Kafka性能压测全记录

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!