avro

how to convert xml to avro without ignoring !CDATA content?

淺唱寂寞╮ 提交于 2019-12-25 16:57:38
问题 I have the following source XML file named customers.xml: <?xml version="1.0" encoding="utf-8"?> <p:CustomerElement xmlns:p="http://www.dog.com/customer" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:schemaLocation="http://www.dog.com/customer Customer.xsd"> <Customer> <Sender> <transmitDate>2016-02-21T00:00:00</transmitDate> <transmitter>Dog ETL v2.0</transmitter> <dealerCode><![CDATA[P020]]></dealerCode> <DMSSystem><![CDATA[DBS]]></DMSSystem> <DMSReleaseNumber><![CDATA[5.0]]><

How to write avro output in hadoop map reduce?

China☆狼群 提交于 2019-12-25 08:29:33
问题 I wrote one Hadoop word count program which takes TextInputFormat input and is supposed to output word count in avro format. Map-Reduce job is running fine but output of this job is readable using unix commands such as more or vi . I was expecting this output be unreadable as avro output is in binary format. I have used mapper only, reducer is not present. I just want to experiment with avro so I am not worried about memory or stack overflow. Following the the code of mapper public class

Registering AVRO schema with confluent schema registery

元气小坏坏 提交于 2019-12-25 04:34:20
问题 Can AVRO schemas be registered with confluent schema registry service ? As per readme on github https://github.com/confluentinc/schema-registry Every example uses a JSON schema with a single field and type without any name. I am trying to store following schema to repository but with different variants getting different error. curl -X POST -H "Content-Type: application/vnd.schemaregistry.v1+json" --data '{"schema": "{"type": "record","name": "myrecord","fields": [{"name": "serialization",

Apache Flink read Avro byte[] from Kafka

女生的网名这么多〃 提交于 2019-12-25 04:12:22
问题 In reviewing examples I see alot of this: FlinkKafkaConsumer08<Event> kafkaConsumer = new FlinkKafkaConsumer08<>("myavrotopic", avroSchema, properties); I see that they here already know the schema. I do not know the schema until I read the byte[] into a Generic Record then get the schema. (As it may change from record to record) Can someone point me into a FlinkKafkaConsumer08 that reads from byte[] into a map filter so that I can remove some leading bits, then load that byte[] into a

Is it possible to convert a generic record to specific record with the same schema?

冷暖自知 提交于 2019-12-25 04:06:42
问题 I have a GenericRecord object of schema A, which is also a generated Avro Java class. Is it possible for me to cast this object into actual A type somehow? 来源: https://stackoverflow.com/questions/54227389/is-it-possible-to-convert-a-generic-record-to-specific-record-with-the-same-sche

Recursive schema with avro (SchemaBuilder)

孤人 提交于 2019-12-25 03:53:40
问题 Is it possible to make an avro schema which is recursive, like Schema schema = SchemaBuilder .record("RecursiveItem") .namespace("com.example") .fields() .name("subItem") .type("RecursiveItem") .withDefault(null) // not sure about that too... .endRecord(); I get a StackOverflowError when using it like that: static class RecursiveItem { RecursiveItem subItem; } RecursiveItem item1 = new RecursiveItem(); RecursiveItem item2 = new RecursiveItem(); item1.subItem = item2; final DatumWriter

Getting serialization error when publish message to KAFKA topic

一世执手 提交于 2019-12-25 02:35:22
问题 I'm using program variables to create configuration objects and loading the schema from a local path which also been registered in kafka. Creating data object and using "Generic Record" method of serializing. var logMessageSchema =(Avro.RecordSchema)Avro.Schema.Parse(File.ReadAllText(@"C:\StatusMessageSchema\FileStatusMessageSchema.txt")); var record = new GenericRecord(logMessageSchema); record.Add("SystemID", "100"); record.Add("FileName", "ABS_DHCS"); record.Add("FileStatus", "3009");

Getting serialization error when publish message to KAFKA topic

喜夏-厌秋 提交于 2019-12-25 02:35:08
问题 I'm using program variables to create configuration objects and loading the schema from a local path which also been registered in kafka. Creating data object and using "Generic Record" method of serializing. var logMessageSchema =(Avro.RecordSchema)Avro.Schema.Parse(File.ReadAllText(@"C:\StatusMessageSchema\FileStatusMessageSchema.txt")); var record = new GenericRecord(logMessageSchema); record.Add("SystemID", "100"); record.Add("FileName", "ABS_DHCS"); record.Add("FileStatus", "3009");

org.apache.kafka.connect.errors.DataException: Invalid JSON for record default value: null

跟風遠走 提交于 2019-12-25 01:45:49
问题 I have a Kafka Avro Topic generated using KafkaAvroSerializer. My standalone properties are as below. I am using Confluent 4.0.0 to run Kafka connect. key.converter=io.confluent.connect.avro.AvroConverter value.converter=io.confluent.connect.avro.AvroConverter key.converter.schema.registry.url=<schema_registry_hostname>:8081 value.converter.schema.registry.url=<schema_registry_hostname>:8081 key.converter.schemas.enable=true value.converter.schemas.enable=true internal.key.converter=org

spark sql error when reading data from Avro Table

若如初见. 提交于 2019-12-25 00:13:32
问题 When I try reading data from an avro table using spark-sql, I am getting this error. Caused by: java.lang.NullPointerException at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.supportedCategories(AvroObjectInspectorGenerator.java:142) at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker(AvroObjectInspectorGenerator.java:91) at org.apache.hadoop.hive.serde2.avro.AvroObjectInspectorGenerator.createObjectInspectorWorker