问题
As I understand from the documentation of kafka connect this configuration should ignore the keys for metricbeat and filebeat topic but not for alarms. But kafka connect does not ignore any key.
So that's the fully json config that i pushing to kafka-connect over rest
{
"auto.create.indices.at.start": false,
"connector.class": "io.confluent.connect.elasticsearch.ElasticsearchSinkConnector",
"connection.url": "http://elasticsearch:9200",
"connection.timeout.ms": 5000,
"read.timeout.ms": 5000,
"tasks.max": "5",
"topics": "filebeat,metricbeat,alarms",
"behavior.on.null.values": "delete",
"behavior.on.malformed.documents": "warn",
"flush.timeout.ms":60000,
"max.retries":42,
"retry.backoff.ms": 100,
"max.in.flight.requests": 5,
"max.buffered.records":20000,
"batch.size":4096,
"drop.invalid.message": true,
"schema.ignore": true,
"topic.key.ignore": "metricbeat,filebeat",
"key.ignore": false
"name": "elasticsearch-ecs-connector",
"type.name": "_doc",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"value.converter.schemas.enable": "false",
"transforms":"routeTS",
"transforms.routeTS.type":"org.apache.kafka.connect.transforms.TimestampRouter",
"transforms.routeTS.topic.format":"${topic}-${timestamp}",
"transforms.routeTS.timestamp.format":"YYYY.MM.dd",
"errors.tolerance": "all" ,
"errors.log.enable": false ,
"errors.log.include.messages": false,
"errors.deadletterqueue.topic.name":"logstream-dlq",
"errors.deadletterqueue.context.headers.enable":true ,
"errors.deadletterqueue.topic.replication.factor": 1
}
That's the logging during start of the connector
[2020-05-01 21:07:49,960] INFO ElasticsearchSinkConnectorConfig values:
auto.create.indices.at.start = false
batch.size = 4096
behavior.on.malformed.documents = warn
behavior.on.null.values = delete
compact.map.entries = true
connection.compression = false
connection.password = null
connection.timeout.ms = 5000
connection.url = [http://elasticsearch:9200]
connection.username = null
drop.invalid.message = true
elastic.https.ssl.cipher.suites = null
elastic.https.ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
elastic.https.ssl.endpoint.identification.algorithm = https
elastic.https.ssl.key.password = null
elastic.https.ssl.keymanager.algorithm = SunX509
elastic.https.ssl.keystore.location = null
elastic.https.ssl.keystore.password = null
elastic.https.ssl.keystore.type = JKS
elastic.https.ssl.protocol = TLS
elastic.https.ssl.provider = null
elastic.https.ssl.secure.random.implementation = null
elastic.https.ssl.trustmanager.algorithm = PKIX
elastic.https.ssl.truststore.location = null
elastic.https.ssl.truststore.password = null
elastic.https.ssl.truststore.type = JKS
elastic.security.protocol = PLAINTEXT
flush.timeout.ms = 60000
key.ignore = false
linger.ms = 1
max.buffered.records = 20000
max.in.flight.requests = 5
max.retries = 42
read.timeout.ms = 5000
retry.backoff.ms = 100
schema.ignore = true
topic.index.map = []
topic.key.ignore = [metricbeat, filebeat]
topic.schema.ignore = []
type.name = _doc
write.method = insert
Iam using Confluent Platform 5.5.0
回答1:
Let's recap here, because there have been several edits to your question and problem statement :)
- You want to stream multiple topics to Elasticsearch with a single connector
- You want to use the message key for some topics as the Elasticsearch document ID, and for others you don't and want to use the Kafka message coordinates instead (topic+partition+offset)
- You are trying to do this with
key.ignore
andtopic.key.ignore
settings
Here's my test data in three topics, test01
, test02
, test03
:
ksql> PRINT test01 from beginning;
Key format: KAFKA_STRING
Value format: AVRO or KAFKA_STRING
rowtime: 2020/05/12 11:08:32.441 Z, key: X, value: {"COL1": 1, "COL2": "FOO"}
rowtime: 2020/05/12 11:08:32.594 Z, key: Y, value: {"COL1": 2, "COL2": "BAR"}
ksql> PRINT test02 from beginning;
Key format: KAFKA_STRING
Value format: AVRO or KAFKA_STRING
rowtime: 2020/05/12 11:08:50.865 Z, key: X, value: {"COL1": 1, "COL2": "FOO"}
rowtime: 2020/05/12 11:08:50.936 Z, key: Y, value: {"COL1": 2, "COL2": "BAR"}
ksql> PRINT test03 from beginning;
Key format: ¯\_(ツ)_/¯ - no data processed
Value format: AVRO or KAFKA_STRING
rowtime: 2020/05/12 11:16:15.166 Z, key: <null>, value: {"COL1": 1, "COL2": "FOO"}
rowtime: 2020/05/12 11:16:46.404 Z, key: <null>, value: {"COL1": 2, "COL2": "BAR"}
With this data I create a connector (I'm using ksqlDB but it's the same as if you use the REST API directly):
CREATE SINK CONNECTOR SINK_ELASTIC_TEST WITH (
'connector.class' = 'io.confluent.connect.elasticsearch.ElasticsearchSinkConnector',
'connection.url' = 'http://elasticsearch:9200',
'key.converter' = 'org.apache.kafka.connect.storage.StringConverter',
'type.name' = '_doc',
'topics' = 'test02,test01,test03',
'key.ignore' = 'false',
'topic.key.ignore'= 'test02,test03',
'schema.ignore' = 'false'
);
The resulting indices are created and populated in Elasticsearch. Here's the index and document ID of the documents:
➜ curl -s http://localhost:9200/test01/_search \
-H 'content-type: application/json' \
-d '{ "size": 5 }' |jq -c '.hits.hits[] | [._index, ._id]'
["test01","Y"]
["test01","X"]
➜ curl -s http://localhost:9200/test02/_search \
-H 'content-type: application/json' \
-d '{ "size": 5 }' |jq -c '.hits.hits[] | [._index, ._id]'
["test02","test02+0+0"]
["test02","test02+0+1"]
➜ curl -s http://localhost:9200/test03/_search \
-H 'content-type: application/json' \
-d '{ "size": 5 }' |jq -c '.hits.hits[] | [._index, ._id]'
["test03","test03+0+0"]
["test03","test03+0+1"]
So key.ignore
is the default and for test01
in effect, which means that the key of the messages are used for the document ID.
Topics test02
and test03
are listed for topic.key.ignore
which means that the key of the message is ignored (i.e. in effect key.ignore=true
), and thus the document ID is the topic/partition/offset of the message.
I would recommend, given that I've proven out above that this does work, that you start your test again from scratch to double-check your working.
来源:https://stackoverflow.com/questions/61550855/kafka-connect-topic-key-ignore-not-works-as-expected