avro

How to deserialize Avro messages from Kafka in Flink (Scala)?

强颜欢笑 提交于 2019-12-23 04:54:28
问题 I'm reading messages from Kafka into Flink Shell (Scala), as follows : scala> val stream = senv.addSource(new FlinkKafkaConsumer011[String]("topic", new SimpleStringSchema(), properties)).print() warning: there was one deprecation warning; re-run with -deprecation for details stream: org.apache.flink.streaming.api.datastream.DataStreamSink[String] = org.apache.flink.streaming.api.datastream.DataStreamSink@71de1091 Here, I'm using the SimpleStringSchema() as the deserializer, but actually the

Task Not Serializable exception when trying to write a rdd of type Generic Record

别说谁变了你拦得住时间么 提交于 2019-12-23 00:32:20
问题 val file = File.createTempFile("temp", ".avro") val schema = new Schema.Parser().parse(st) val datumWriter = new GenericDatumWriter[GenericData.Record](schema) val dataFileWriter = new DataFileWriter[GenericData.Record](datumWriter) dataFileWriter.create(schema , file) rdd.foreach(r => { dataFileWriter.append(r) }) dataFileWriter.close() I have a DStream of type GenericData.Record which I am trying to write to HDFS in the Avro format but I'm getting this Task Not Serializable error: org

How to decode Kafka messages using Avro and Flink

做~自己de王妃 提交于 2019-12-22 10:34:49
问题 I am trying to read AVRO data from a Kafka topic using Flink 1.0.3. I just know that this particular Kafka topic is having AVRO encoded message and I am having the AVRO schema file. My Flink code: public static void main(String[] args) throws Exception { StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "dojo3xxxxx:9092,dojoxxxxx:9092,dojoxxxxx:9092"); properties

Reading Event Hub Archive File in C#

喜你入骨 提交于 2019-12-22 09:19:54
问题 Is there any sample code in C# for reading the Azure Event Hub Archive files (Avro format)? I am trying to use the Microsoft.Hadoop.Avro library. I dumped the schema out using a java avro tool which produces this: { ""type"":""record"", ""name"":""EventData"", ""namespace"":""Microsoft.ServiceBus.Messaging"", ""fields"":[ {""name"":""SequenceNumber"",""type"":""long""}, {""name"":""Offset"",""type"":""string""}, {""name"":""EnqueuedTimeUtc"",""type"":""string""}, {""name"":""SystemProperties"

Reading/writing with Avro schemas AND Parquet format in SparkSQL

杀马特。学长 韩版系。学妹 提交于 2019-12-22 06:44:30
问题 I'm trying to write and read Parquet files from SparkSQL. For reasons of schema evolution, I would like to use Avro schemas with my writes and reads. My understanding is that this is possible outside of Spark (or manually within Spark) using e.g. AvroParquetWriter and Avro's Generic API. However, I would like to use SparkSQL's write() and read() methods (which work with DataFrameWriter and DataFrameReader), and which integrate well with SparkSQL (I will be writing and reading Dataset's). I

Converting byte array to Json giving avro Schema as input is giving an error

↘锁芯ラ 提交于 2019-12-21 21:36:15
问题 I have a simple JSON String jsonPayload = "{\"empid\": \"6\",\"empname\": \"Saurabh\",\"address\": \"home\"}"; jsonPayload.getBytes(); I created avro schema {"namespace": "sample.namespace", "type": "record", "name": "Employee", "fields": [ {"name": "empid", "type": "string"}, {"name": "empname", "type": "string"}, {"name": "address", "type": "string"} ] } When I try to compare them I get an error Exception : org.apache.avro.AvroRuntimeException: Malformed data. Length is negative: -62 at org

how to read avro files in python 3.5.2

非 Y 不嫁゛ 提交于 2019-12-21 17:01:34
问题 I am trying to read avro files using python. I installed Apache Avro successfully (I think I did because I am able to "import avro" in the python shell) following the instruction here https://avro.apache.org/docs/1.8.1/gettingstartedpython.html However, when I try to read avro files following the code in the above instruction. I keep receiving errors when importing avro related stuff. >>> import avro.schema Traceback (most recent call last): File "<pyshell#6>", line 1, in <module> import avro

Nesting Avro schemas

走远了吗. 提交于 2019-12-21 09:28:47
问题 According to this question on nesting Avro schemas, the right way to nest a record schema is as follows: { "name": "person", "type": "record", "fields": [ {"name": "firstname", "type": "string"}, {"name": "lastname", "type": "string"}, { "name": "address", "type": { "type" : "record", "name" : "AddressUSRecord", "fields" : [ {"name": "streetaddress", "type": "string"}, {"name": "city", "type": "string"} ] }, } ] } I don't like giving the field the name address and having to give a different

Nesting Avro schemas

泪湿孤枕 提交于 2019-12-21 09:28:31
问题 According to this question on nesting Avro schemas, the right way to nest a record schema is as follows: { "name": "person", "type": "record", "fields": [ {"name": "firstname", "type": "string"}, {"name": "lastname", "type": "string"}, { "name": "address", "type": { "type" : "record", "name" : "AddressUSRecord", "fields" : [ {"name": "streetaddress", "type": "string"}, {"name": "city", "type": "string"} ] }, } ] } I don't like giving the field the name address and having to give a different

Does binary encoding of AVRO compress data?

北战南征 提交于 2019-12-21 07:25:26
问题 In one of our projects we are using Kafka with AVRO to transfer data across applications. Data is added to an AVRO object and object is binary encoded to write to Kafka. We use binary encoding as it is generally mentioned as a minimal representation compared to other formats. The data is usually a JSON string and when it is saved in a file, it uses up to 10 Mb of disk. However, when the file is compressed (.zip), it uses only few KBs. We are concerned storing such data in Kafka, so trying to