问题
I'd like to read from a multiple topics from cdc debezium from source postgres database, using a key from kafka message holding a primary keys. Then, the connector performs ETL operations in source database.
When I set delete.enabled
to true
I cannot use kafka primary keys, it says I have to specify record_key
and pk_fields
.
My idea is, set regex to read multiple desired topics, get table name from topic name and use primary keys holding by kafka topic, which is being currently read.
name=sink-postgres
connector.class=io.confluent.connect.jdbc.JdbcSinkConnector
value.converter=org.apache.kafka.connect.json.JsonConverter
value.converter.schemas.enable=true
topics=pokladna.public.*use_regex
connection.url=jdbc:postgresql://localhost:5434/postgres
connection.user=postgres
connection.password=postgres
dialect.name=PostgreSqlDatabaseDialect
table.name.format=*get_table_table_name_from_topic_name
transforms=unwrap
transforms.unwrap.type=io.debezium.transforms.ExtractNewRecordState
transforms.unwrap.drop.tombstones=false
auto.create=true
auto.evolve=true
insert.mode=upsert
delete.enabled=true
pk.mode=kafka
When I set delete.enabled=true
, set pk.mode=record_value
and pk.fields
empty. I got an following error even during INSERT (same error when I set pk.mode=kafka
).
ERROR WorkerSinkTask{id=sink-postgres-perform-0} Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask:187)
org.apache.kafka.common.config.ConfigException: Primary key mode must be 'record_key' when delete support is enabled
at io.confluent.connect.jdbc.sink.JdbcSinkConfig.<init>(JdbcSinkConfig.java:540)
at io.confluent.connect.jdbc.sink.JdbcSinkTask.start(JdbcSinkTask.java:45)
at org.apache.kafka.connect.runtime.WorkerSinkTask.initializeAndStart(WorkerSinkTask.java:302)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:193)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
When I set delete.enabled=true
, leave pk.mode=record_key
and pk.fields
empty. I got an following error even during INSERT
ERROR WorkerSinkTask{id=sink-postgres-perform-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted. Error: Cannot ALTER TABLE "questions" to add missing field SinkRecordField{schema=Schema{STRING}, name='__dbz__physicalTableIdentifier', isPrimaryKey=true}, as the field is not optional and does not have a default value (org.apache.kafka.connect.runtime.WorkerSinkTask:586)
org.apache.kafka.connect.errors.ConnectException: Cannot ALTER TABLE "questions" to add missing field SinkRecordField{schema=Schema{STRING}, name='__dbz__physicalTableIdentifier', isPrimaryKey=true}, as the field is not optional and does not have a default value
Am I doing something wrong? Bad properties file configuration, or is it a bug in kafka sink or some limitation? I am able perform ETL into target database providing pk.mode=record_key
and pk.field=id_coolumn_names
.
I have 40 tables, so do I have to really create 40 properties files apriori filled with column_names and then run 40time connect sink? It souds silly...
来源:https://stackoverflow.com/questions/64648969/kafka-jdbc-sink-with-delete-true-option-do-i-have-to-use-record-key