How to produce a Tombstone Avro Record in Kafka using Python?

问题

my sink properties :

{
  "name": "jdbc-oracle",
  "config": {
    "connector.class": "io.confluent.connect.jdbc.JdbcSinkConnector",
    "tasks.max": "1",
    "topics": "orders",
    "connection.url": "jdbc:oracle:thin:@10.1.2.3:1071/orac",
    "connection.user": "ersin",
    "connection.password": "ersin!",
    "auto.create": "true",
    "delete.enabled": "true",
    "pk.mode": "record_key",
    "pk.fields": "id",
    "insert.mode": "upsert",
    "plugin.path": "/home/ersin/confluent-5.4.1/share/java/",
    "name": "jdbc-oracle"
  },
  "tasks": [
    {
      "connector": "jdbc-oracle",
      "task": 0
    }
  ],
  "type": "sink"
}

my connect-avro-distributed.properties :

bootstrap.servers=10.0.0.0:9092

group.id=connect-cluster

key.converter=io.confluent.connect.avro.AvroConverter
key.converter.schema.registry.url=http://10.0.0.0:8081
value.converter=io.confluent.connect.avro.AvroConverter
value.converter.schema.registry.url=http://10.0.0.0:8081

config.storage.topic=connect-configs
offset.storage.topic=connect-offsets
status.storage.topic=connect-statuses

config.storage.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1

internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false

I send data like this:

from kafka import KafkaProducer

producer = KafkaProducer(
    bootstrap_servers=['10.0.0.0:9092'],
)
message=producer.send('orders', key=b'{"id":1}', value=None)

But it gives error. Serialization error.

回答1:

I assume you want to produce Avro message therefore you need to serialise your messages properly. I'll be using confluent-kafka-python library so if you don't already have it installed, just run

pip install confluent-kafka[avro]

And here's an example AvroConsumer that sends an Avro message with a null value:

from confluent_kafka import avro
from confluent_kafka.avro import AvroProducer


value_schema_str = """
{
   "type":"record",
   "name":"myrecord",
   "fields":[
      {
         "name":"id",
         "type":[
            "null",
            "int"
         ],
         "default":null
      },
      {
         "name":"product",
         "type":[
            "null",
            "string"
         ],
         "default":null
      },
      {
         "name":"quantity",
         "type":[
            "null",
            "int"
         ],
         "default":null
      },
      {
         "name":"price",
         "type":[
            "null",
            "int"
         ],
         "default":null
      }
   ]
}
"""

key_schema_str = """
{
   "type":"record",
   "name":"key_schema",
   "fields":[
      {
         "name":"id",
         "type":"int"
      }
   ]
}
"""


def delivery_report(err, msg):
    """ Called once for each message produced to indicate delivery result.
        Triggered by poll() or flush(). """
    if err is not None:
        print('Message delivery failed: {}'.format(err))
    else:
        print('Message delivered to {} [{}]'.format(msg.topic(), msg.partition()))


if __name__ == '__main__':
    value_schema = avro.loads(value_schema_str)
    key_schema = avro.loads(key_schema_str)
    #value = {"id": 1, "product": "myProduct", "quantity": 10, "price": 100}
    key = {"id": 1}


    avroProducer = AvroProducer({
        'bootstrap.servers': '10.0.0.0:9092',
        'on_delivery': delivery_report,
        'schema.registry.url': 'http://10.0.0.0:8081'
    }, default_key_schema=key_schema, default_value_schema=value_schema)

    avroProducer.produce(topic='orders', key=key)
    avroProducer.flush()

回答2:

You need to set in Avro Schema to be able to set Avro field to null, by adding null as one of the possible types of the field.

Take a look on example from Avro documentation:

{
  "type": "record",
  "name": "yourRecord",
  "fields" : [
    {"name": "empId", "type": "long"},              // mandatory field
    {"name": "empName", "type": ["null", "string"]} // optional field 
  ]
}

here empName is declared as in type as a null or string. which allows to set empName field to null.

来源：https://stackoverflow.com/questions/61195652/how-to-produce-a-tombstone-avro-record-in-kafka-using-python

标签

python

apache-kafka

kafka-producer-api

kafka-python