apache-kafka

Aug 2019 - Kafka Consumer Lag programmatically

扶醉桌前 提交于 2021-01-29 12:20:03
问题 Is there any way we can programmatically find lag in the Kafka Consumer. I don't want external Kafka Manager tools to install and check on dashboard. We can list all the consumer group and check for lag for each group. Currently we do have command to check the lag and it requires the relative path where the Kafka resides. Spring-Kafka, kafka-python, Kafka Admin client or using JMX - is there any way we can code and find out the lag. We were careless and didn't monitor the process, the

Unable to install npm package (kafka-streams)

☆樱花仙子☆ 提交于 2021-01-29 11:22:09
问题 I am trying to use npm package kafka-streams but getting below error: PS D:\Projects\POCs\kstreams-poc> npm install kafka-streams > node-rdkafka@2.7.1 install D:\Projects\POCs\kstreams-poc\node_modules\node-rdkafka > node-gyp rebuild D:\Projects\POCs\kstreams-poc\node_modules\node-rdkafka>if not defined npm_config_node_gyp (node "C:\Users\virtual\AppData\Roaming\npm\node_modules\npm\node_modules\npm-lifecycle\node-gyp-bin\\..\..\node_modules\node-gyp\bin\node-gyp.js" rebuild ) else (node "C:

Problems with Amazon MSK default configuration and publishing with transactions

本秂侑毒 提交于 2021-01-29 10:33:50
问题 Recently we have started doing some testing of our Kafka connectors to MSK, Amazon's managed Kafka service. Publishing records seem to work fine however not when transactions are enabled. Our cluster consists of 2 brokers (because we have 2 zones) using the default MSK configuration. We are creating our Java Kafka producer using the following properties: bootstrap.servers=x.us-east-1.amazonaws.com:9094,y.us-east-1.amazonaws.com:9094 client.id=kafkautil max.block.ms=5000 request.timeout.ms

spark streaming kafka : Unknown error fetching data for topic-partition

自作多情 提交于 2021-01-29 10:31:12
问题 I'm trying to read a Kafka topic from a Spark cluster using Structured Streaming API with Kafka integration in Spark val sparkSession = SparkSession.builder() .master("local[*]") .appName("some-app") .getOrCreate() Kafka stream creation import sparkSession.implicits._ val dataFrame = sparkSession .readStream .format("kafka") .option("subscribepattern", "preprod-*") .option("kafka.bootstrap.servers", "<brokerUrl>:9094") .option("kafka.ssl.protocol", "TLS") .option("kafka.security.protocol",

Kafka consumer returns no records

三世轮回 提交于 2021-01-29 10:23:31
问题 I am trying to makea small PoC with Kafka. However, when making the consumer in java, this consumer gets no messages. Even though when I fire up a kafka-console-consumer.sh with the same url/topic, I do get messages. Does anyone know what I might do wrong? This code is called by a GET API. public List<KafkaTextMessage> receiveMessages() { log.info("Retrieving messages from kafka"); val props = new Properties(); // See https://kafka.apache.org/documentation/#consumerconfigs props.put(

Suricata to Filebeat to Kafka, routing to topics by event-type

梦想的初衷 提交于 2021-01-29 10:11:57
问题 I discovered Filebeat a couple days ago. I have it sending data to Kafka directly if I hard code the topic name in filebeat.yml. But I can't seem to figure out how to dynamically compute the topic name based on suricata event type. I've enabled the filebeat suricata module, and tried a number of things in the filebeat.yml topic value, like: topic: 'suricata-%{[fields.suricata.eve.event_type]}' But I always get this error in the log: 2020-01-14T23:44:49.550Z INFO kafka/log.go:53 kafka message:

TypeError: 'JavaPackage' object is not callable & Spark Streaming's Kafka libraries not found in class path

自闭症网瘾萝莉.ら 提交于 2021-01-29 09:48:01
问题 I use pyspark streaming to read kafka data, but it went wrong: import os from pyspark.streaming.kafka import KafkaUtils from pyspark.streaming import StreamingContext from pyspark import SparkContext os.environ['PYSPARK_SUBMIT_ARGS'] = '--packages org.apache.spark:spark-streaming-kafka-0-8:2.0.2 pyspark-shell' sc = SparkContext(appName="test") sc.setLogLevel("WARN") ssc = StreamingContext(sc, 60) kafkaStream = KafkaUtils.createStream(ssc, "localhost:2181", "test-id", {'test': 2}) kafkaStream

Apache Flink - Partitioning the stream equally as the input Kafka topic

余生颓废 提交于 2021-01-29 09:46:30
问题 I would like to implement in Apache Flink the following scenario: Given a Kafka topic having 4 partitions, I would like to process the intra-partition data independently in Flink using different logics, depending on the event's type. In particular, suppose the input Kafka topic contains the events depicted in the previous images. Each event have a different structure: partition 1 has the field " a " as key, partition 2 has the field " b " as key, etc. In Flink I would like to apply different

Kafka: Connector to consume data from websockets and push to topic

牧云@^-^@ 提交于 2021-01-29 09:43:40
问题 We have a data flow pipeline where logs are sent from a websocket endpoint, which need to be pushed to Splunk after doing simple data enhancing (password masking etc). I was checking if Kafka can be used for this because the volumes are really high. So, the possible flow is: Websocket Endpoint --------- some-wss-connector --------> Kafka Topic -------- splunk-connector ----------> Splunk I found the connector for pushing to Splunk at: https://github.com/splunk/kafka-connect-splunk and it

Kafka Consumer- ClassCastException java

让人想犯罪 __ 提交于 2021-01-29 08:54:44
问题 My Kafka consumer throws an exception when trying to process messages in a batch(i.e process list of messages) Error Message is java.lang.ClassCastException: class kafka.psmessage.PMessage cannot be cast to class org.apache.kafka.clients.consumer.ConsumerRecord (kafka.psmessage.pMessage and org.apache.kafka.clients.consumer.ConsumerRecord are in unnamed module of loader 'app'); nested exception is java.lang.ClassCastException: class kafka.psmessage.PMessage cannot be cast to class org.apache