confluent

Kafka sink connector: No tasks assigned, even after restart

只谈情不闲聊 提交于 2019-12-06 07:14:00
问题 I am using Confluent 3.2 in a set of Docker containers, one of which is running a kafka-connect worker. For reasons yet unclear to me, two of my four connectors - to be specific, hpgraphsl's MongoDB sink connector - stopped working. I was able to identify the main problem: The connectors did not have any tasks assigned, as could be seen by calling GET /connectors/{my_connector}/status . The other two connectors (of the same type) were not affected and were happily producing output. I tried

“log4j.properties was unexpected at this time” while trying to start Zookeeper in windows

試著忘記壹切 提交于 2019-12-06 02:58:32
I am using kafka stream download from Confluent ( http://www.confluent.io/product/kafka-streams/ ). I am following the instructions to run Zookeeper and Kafka on Windows. But while I try to start ZooKeeper using the command D:\Softwares\confluent-3.0.1\bin\windows>zookeeper-server-start.bat ./etc/kafka/zookeeper.properties , I get the error D:\Softwares\confluent-3.0.1\bin\windows../../etc/kafka/log4j.properties was unexpected at this time. If I check the "zookeeper-server-start.bat" file the commands look ok and is like below.There also exists log4j.properties file under directory confluent-3

Restarting Kafka Connect S3 Sink Task Loses Position, Completely Rewrites everything

不问归期 提交于 2019-12-06 00:26:16
问题 After restarting a Kafka Connect S3 sink task, it restarted writing all the way from the beginning of the topic and wrote duplicate copies of older records. In other words, Kafka Connect seemed to lose its place. So, I imagine that Kafka Connect stores current offset position information in the internal connect-offsets topic. That topic is empty which I presume is part of the problem. The other two internal topics connect-statuses and connect-configs are not empty. connect-statuses has 52

How to populate the cache in CachedSchemaRegistryClient without making a call to register a new schema?

最后都变了- 提交于 2019-12-05 12:56:34
we have a spark streaming application which integrates with Kafka, I'm trying to optimize it because it makes excessive calls to Schema Registry to download schema. The avro schema for our data rarely changes, and currently our application calls the Schema Registry whenever a record comes in, which is way too much. I ran into CachedSchemaRegistryClient from confluent, and it looked promising. Though after looking into its implementation I'm not sure how to use its built-in cache to reduce the REST calls to Schema Registry. The above link will bring you to the source code, here I'm pasting the

Push Data from Kafka Topic to PostgreSQL in JSON

。_饼干妹妹 提交于 2019-12-05 09:30:24
问题 Error after updates [2019-07-29 12:52:23,301] INFO Initializing writer using SQL dialect: PostgreSqlDatabaseDialect (io.confluent.connect.jdbc.sink.JdbcSinkTask:57) [2019-07-29 12:52:23,303] INFO WorkerSinkTask{id=sink-postgres-0} Sink task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:301) [2019-07-29 12:52:23,367] WARN [Consumer clientId=consumer-1, groupId=connect-sink-postgres] Error while fetching metadata with correlation id 2 : {kafkadad=LEADER_NOT

Kafka Streams with lookup data on HDFS

╄→гoц情女王★ 提交于 2019-12-05 08:21:39
I'm writing an application with Kafka Streams (v0.10.0.1) and would like to enrich the records I'm processing with lookup data. This data (timestamped file) is written into a HDFS directory on daily basis (or 2-3 times a day). How can I load this in the Kafka Streams application and join to the actual KStream ? What would be the best practice to reread the data from HDFS when a new file arrives there? Or would it be better switching to Kafka Connect and write the RDBMS table content to a Kafka topic which can be consumed by all the Kafka Streams application instances? Update : As suggested

Push Data from Kafka Topic to PostgreSQL in JSON

牧云@^-^@ 提交于 2019-12-04 20:03:01
Error after updates [2019-07-29 12:52:23,301] INFO Initializing writer using SQL dialect: PostgreSqlDatabaseDialect (io.confluent.connect.jdbc.sink.JdbcSinkTask:57) [2019-07-29 12:52:23,303] INFO WorkerSinkTask{id=sink-postgres-0} Sink task finished initialization and start (org.apache.kafka.connect.runtime.WorkerSinkTask:301) [2019-07-29 12:52:23,367] WARN [Consumer clientId=consumer-1, groupId=connect-sink-postgres] Error while fetching metadata with correlation id 2 : {kafkadad=LEADER_NOT_AVAILABLE} (org.apache.kafka.clients.NetworkClient:1023) [2019-07-29 12:52:23,368] INFO Cluster ID:

Confluent's Kafka REST Proxy vs Kafka Client

时光总嘲笑我的痴心妄想 提交于 2019-12-04 18:11:52
I am curious about the advantages and disadvantages of Confluent's Kafka REST Proxy and the producer/consumer implemented with the kafka official client library. i know that Confluent's Kafka REST Proxy is used for administrative tasks and for languages ​​not supported by the kafka client. So, what are the advantages of the kafka client? One advantage of a native client would be raw performance via direct TCP to the brokers rather than round trip HTTP serialization + JVM serialization taking place within the REST Proxy. A disadvantage with the above could be maintaining security policies for

Kafka sink connector: No tasks assigned, even after restart

梦想与她 提交于 2019-12-04 13:16:28
I am using Confluent 3.2 in a set of Docker containers, one of which is running a kafka-connect worker. For reasons yet unclear to me, two of my four connectors - to be specific, hpgraphsl's MongoDB sink connector - stopped working. I was able to identify the main problem: The connectors did not have any tasks assigned, as could be seen by calling GET /connectors/{my_connector}/status . The other two connectors (of the same type) were not affected and were happily producing output. I tried three different methods to get my connectors running again via the REST API: Pausing and resuming the

Kafka connect cluster setup or launching connect workers

心不动则不痛 提交于 2019-12-04 07:23:09
I am going through kafka connect, and i am trying to get the concepts. Let us say I have kafka cluster (nodes k1, k2 and k3) setup and it is running, now i want to run kafka connect workers in different nodes say c1 and c2 in distributed mode. Few questions. 1) To run or launch kafka connect in distributed mode I need to use command ../bin/connect-distributed.sh , which is available in kakfa cluster nodes, so I need to launch kafka connect from any one of the kafka cluster nodes? or any node from where I launch kafka connect needs to have kafka binaries so that i will be able to use ../bin