avro | 易学教程

How to deserialize Avro messages from Kafka in Flink (Scala)?

阅读更多关于 How to deserialize Avro messages from Kafka in Flink (Scala)?

问题 I'm reading messages from Kafka into Flink Shell (Scala), as follows : scala> val stream = senv.addSource(new FlinkKafkaConsumer011[String]("topic", new SimpleStringSchema(), properties)).print() warning: there was one deprecation warning; re-run with -deprecation for details stream: org.apache.flink.streaming.api.datastream.DataStreamSink[String] = org.apache.flink.streaming.api.datastream.DataStreamSink@71de1091 Here, I'm using the SimpleStringSchema() as the deserializer, but actually the

Task Not Serializable exception when trying to write a rdd of type Generic Record

阅读更多关于 Task Not Serializable exception when trying to write a rdd of type Generic Record

问题 val file = File.createTempFile("temp", ".avro") val schema = new Schema.Parser().parse(st) val datumWriter = new GenericDatumWriter[GenericData.Record](schema) val dataFileWriter = new DataFileWriter[GenericData.Record](datumWriter) dataFileWriter.create(schema , file) rdd.foreach(r => { dataFileWriter.append(r) }) dataFileWriter.close() I have a DStream of type GenericData.Record which I am trying to write to HDFS in the Avro format but I'm getting this Task Not Serializable error: org

How to decode Kafka messages using Avro and Flink

阅读更多关于 How to decode Kafka messages using Avro and Flink

问题 I am trying to read AVRO data from a Kafka topic using Flink 1.0.3. I just know that this particular Kafka topic is having AVRO encoded message and I am having the AVRO schema file. My Flink code: public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "dojo3xxxxx:9092,dojoxxxxx:9092,dojoxxxxx:9092"); properties

Reading Event Hub Archive File in C#

阅读更多关于 Reading Event Hub Archive File in C#

问题 Is there any sample code in C# for reading the Azure Event Hub Archive files (Avro format)? I am trying to use the Microsoft.Hadoop.Avro library. I dumped the schema out using a java avro tool which produces this: { ""type"":""record"", ""name"":""EventData"", ""namespace"":""Microsoft.ServiceBus.Messaging"", ""fields"":[ {""name"":""SequenceNumber"",""type"":""long""}, {""name"":""Offset"",""type"":""string""}, {""name"":""EnqueuedTimeUtc"",""type"":""string""}, {""name"":""SystemProperties"

Reading/writing with Avro schemas AND Parquet format in SparkSQL

阅读更多关于 Reading/writing with Avro schemas AND Parquet format in SparkSQL

问题 I'm trying to write and read Parquet files from SparkSQL. For reasons of schema evolution, I would like to use Avro schemas with my writes and reads. My understanding is that this is possible outside of Spark (or manually within Spark) using e.g. AvroParquetWriter and Avro's Generic API. However, I would like to use SparkSQL's write() and read() methods (which work with DataFrameWriter and DataFrameReader), and which integrate well with SparkSQL (I will be writing and reading Dataset's). I

Converting byte array to Json giving avro Schema as input is giving an error

阅读更多关于 Converting byte array to Json giving avro Schema as input is giving an error

问题 I have a simple JSON String jsonPayload = "{\"empid\": \"6\",\"empname\": \"Saurabh\",\"address\": \"home\"}"; jsonPayload.getBytes(); I created avro schema {"namespace": "sample.namespace", "type": "record", "name": "Employee", "fields": [ {"name": "empid", "type": "string"}, {"name": "empname", "type": "string"}, {"name": "address", "type": "string"} ] } When I try to compare them I get an error Exception : org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -62 at org

how to read avro files in python 3.5.2

阅读更多关于 how to read avro files in python 3.5.2

问题 I am trying to read avro files using python. I installed Apache Avro successfully (I think I did because I am able to "import avro" in the python shell) following the instruction here https://avro.apache.org/docs/1.8.1/gettingstartedpython.html However, when I try to read avro files following the code in the above instruction. I keep receiving errors when importing avro related stuff. >>> import avro.schema Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> import avro

Nesting Avro schemas

阅读更多关于 Nesting Avro schemas

问题 According to this question on nesting Avro schemas, the right way to nest a record schema is as follows: { "name": "person", "type": "record", "fields": [ {"name": "firstname", "type": "string"}, {"name": "lastname", "type": "string"}, { "name": "address", "type": { "type" : "record", "name" : "AddressUSRecord", "fields" : [ {"name": "streetaddress", "type": "string"}, {"name": "city", "type": "string"} ] }, } ] } I don't like giving the field the name address and having to give a different

Nesting Avro schemas

阅读更多关于 Nesting Avro schemas

Does binary encoding of AVRO compress data?

阅读更多关于 Does binary encoding of AVRO compress data?

问题 In one of our projects we are using Kafka with AVRO to transfer data across applications. Data is added to an AVRO object and object is binary encoded to write to Kafka. We use binary encoding as it is generally mentioned as a minimal representation compared to other formats. The data is usually a JSON string and when it is saved in a file, it uses up to 10 Mb of disk. However, when the file is compressed (.zip), it uses only few KBs. We are concerned storing such data in Kafka, so trying to