Kafka connect cluster setup or launching connect workers

后端 未结 2 1639
梦毁少年i
梦毁少年i 2021-02-04 15:11

I am going through kafka connect, and i am trying to get the concepts.

Let us say I have kafka cluster (nodes k1, k2 and k3) setup and it is running, now i want to run

相关标签:
2条回答
  • 2021-02-04 15:52

    1) In order to have a highly available kafka-connect service you need to run at least two instances of connect-distributed.sh on two distinct machines that have the same group.id. You can find more details regarding the configuration of each worker here. For improved performance, Connect should be ran independently of the broker and Zookeeper machines.

    2) Yes, you need to place all your connectors under plugin.path (normally under /usr/share/java/) on every machine that you are planning to run kafka-connect.

    3) kafka-connect will load the connectors on startup. You don't need to handle this. Note that if your kafka-connect instance is running and a new connector is added, you need to restart the service.

    4) You need to have Java installed on all your machines. For Confluent Platform particularly:

    Java 1.7 and 1.8 are supported in this version of Confluent Platform (Java 1.9 is currently not supported). You should run with the Garbage-First (G1) garbage collector. For more information, see the Supported Versions and Interoperability.

    5) It depends. Confluent was founded by the original creators of Apache Kafka and it comes as a more complete distribution adding schema management, connectors and clients. It also comes with KSQL which is quite useful if you need to act on certain events. Confluent simply adds on top of the Apache Kafka distribution, it's not a modified version.

    0 讨论(0)
  • 2021-02-04 15:52

    Answer given by Giorgos is correct. I ran few connectors and now I understand it better.

    I am just trying to put it differently.

    In Kafka connect there are two things involved one is Worker and second is connector.Below is on details about running distributed Kafka connect.

    Kafka connect Worker is a Java process on which the connector/connect task will run. So first thing is we need to launch worker, to run/launch a worker we need java installed on that machine then we need Kafka connect related sh/bat files to launch worker and kafka libs which will be used by kafka connect worker, for this we will just simply copy/install Kafka in the worker machine, also we need to copy all the connector and connect-task related jars/dependencies in "plugin.path" as defined in the below worker properties file, now worker machine is ready, to start worker we need to invoke ./bin/connect-distributed.sh ./config/connect-distributed.properties, here connect-distributed.properties will have configuration for worker. The same thing has to be repeated in each machine where we need to run Kafka connect.

    Now the worker java process is running in all machines, the woker config will have group.id property, the workers which have this same property value will be forming a group/cluster of workers.

    Each worker process will expose rest endpoint (default http://localhost:8083/connectors), to launch/start a connector on the running workers, we need do http-post a connector config json, based on the given config the worker will start the connector and the number of tasks in the above group/cluster workers.

    Example: Connect post,

    curl -X POST -H "Content-Type: application/json" --data '{"name": "local-file-sink", "config": {"connector.class":"FileStreamSinkConnector", "tasks.max":"3", "file":"test.sink.txt", "topics":"connect-test" }}' http://localhost:8083/connectors
    
    0 讨论(0)
提交回复
热议问题