avro | 易学教程

optional array in avro schema

阅读更多关于 optional array in avro schema

问题 I'm wondering whether or not it is possible to have an optional array. Let's assume a schema like this: { "type": "record", "name": "test_avro", "fields" : [ {"name": "test_field_1", "type": "long"}, {"name": "subrecord", "type": [{ "type": "record", "name": "subrecord_type", "fields":[{"name":"field_1", "type":"long"}] },"null"] }, {"name": "simple_array", "type":{ "type": "array", "items": "string" } } ] } Trying to write an avro record without "simple_array" would result in a NPE in the

Apache Kafka and Avro: org.apache.avro.generic.GenericData$Record cannot be cast to com.harmeetsingh13.java.Customer

阅读更多关于 Apache Kafka and Avro: org.apache.avro.generic.GenericData$Record cannot be cast to com.harmeetsingh13.java.Customer

问题 Whenever I am trying to read the message from kafka queue, I am getting following exception : [error] (run-main-0) java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.harmeetsingh13.java.Customer java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to com.harmeetsingh13.java.Customer at com.harmeetsingh13.java.consumers.avrodesrializer.AvroSpecificDeserializer.infiniteConsumer(AvroSpecificDeserializer.java:79) at

Concat Avro files using avro-tools

阅读更多关于 Concat Avro files using avro-tools

问题 Im trying to merge avro files into one big file, the problem is concat command does not accept the wildcard hadoop jar avro-tools.jar concat /input/part* /output/bigfile.avro I get: Exception in thread "main" java.io.FileNotFoundException: File does not exist: /input/part* I tried to use "" and '' but no chance. 回答1: I quickly checked Avro's source code (1.7.7) and it seems that concat does not support glob patterns (basically, they call FileSystem.open() on each argument except the last one)

Get a typed value from an Avro GenericRecord

阅读更多关于 Get a typed value from an Avro GenericRecord

问题 Given a GenericRecord, what is the recommended way to retrieve a typed value, as opposed to an Object? Are we expected to cast the values, and if so what is the mapping from Avro types to Java types? For example, Avro Array == Java Collection ; and Avro String == Java Utf8. Since every GenericRecord contains its schema, I was hoping for a type-safe way to retrieve values. 回答1: Avro has eight primitive types and five complex types (excluding unions which are a combination of other types). The

How can I load Avros in Spark using the schema on-board the Avro file(s)?

阅读更多关于 How can I load Avros in Spark using the schema on-board the Avro file(s)?

问题 I am running CDH 4.4 with Spark 0.9.0 from a Cloudera parcel. I have a bunch of Avro files that were created via Pig's AvroStorage UDF. I want to load these files in Spark, using a generic record or the schema onboard the Avro files. So far I've tried this: import org.apache.avro.mapred.AvroKey import org.apache.avro.mapreduce.AvroKeyInputFormat import org.apache.hadoop.io.NullWritable import org.apache.commons.lang.StringEscapeUtils.escapeCsv import org.apache.hadoop.fs.Path import org

Avro schema for Json array

阅读更多关于 Avro schema for Json array

问题 Suppose I have following json: [ {"id":1,"text":"some text","user_id":1}, {"id":1,"text":"some text","user_id":2}, ... ] What would be an appropriate avro schema for this array of objects? 回答1: [short answer] The appropriate avro schema for this array of objects would look like: const type = avro.Type.forSchema({ type: 'array', items: { type: 'record', fields: [ { name: 'id', type: 'int' }, { name: 'text', type: 'string' }, { name: 'user_id', type: 'int' } ] } }); [long answer] We can use

Parquet Data timestamp columns INT96 not yet implemented in Druid Overlord Hadoop task

阅读更多关于 Parquet Data timestamp columns INT96 not yet implemented in Druid Overlord Hadoop task

问题 Context: I am able to submit a MapReduce job from druid overlord to an EMR. My Data source is in S3 in Parquet format. I have a timestamp column (INT96) in parquet data which is not supported in Avroschema. Error is while parsing the timestamp Issue Stack trace is: Error: java.lang.IllegalArgumentException: INT96 not yet implemented. at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96(AvroSchemaConverter.java:279) at org.apache.parquet.avro.AvroSchemaConverter$1.convertINT96

Data serialization framework

阅读更多关于 Data serialization framework

问题 I'm new to this Apache Avro(serialization framework). I know what serialization is but why there are separate frameworks lik avro, thrift, protocol buffers and Why cant we use java serialization api's instead of these separate frameworks, are there any flaws in java serializatio api's. What is the meaning of below phrase "does not require running a code-generation program when a schema changes" in avro or in any other serializatio framework. Please help me to understand all these!! 回答1: Why

Spark: Writing to Avro file

阅读更多关于 Spark: Writing to Avro file

问题 I am in Spark, I have an RDD from an Avro file. I now want to do some transformations on that RDD and save it back as an Avro file: val job = new Job(new Configuration()) AvroJob.setOutputKeySchema(job, getOutputSchema(inputSchema)) rdd.map(elem => (new SparkAvroKey(doTransformation(elem._1)), elem._2)) .saveAsNewAPIHadoopFile(outputPath, classOf[AvroKey[GenericRecord]], classOf[org.apache.hadoop.io.NullWritable], classOf[AvroKeyOutputFormat[GenericRecord]], job.getConfiguration) When running

How to read and write Map<String, Object> from/to parquet file in Java or Scala?

阅读更多关于 How to read and write Map from/to parquet file in Java or Scala?

问题 Looking for a concise example on how to read and write Map<String, Object> from/to parquet file in Java or Scala? Here is expected structure, using com.fasterxml.jackson.databind.ObjectMapper as serializer in Java (i.e. looking for equivalent using parquet): public static Map<String, Object> read(InputStream inputStream) throws IOException { ObjectMapper objectMapper = new ObjectMapper(); return objectMapper.readValue(inputStream, new TypeReference<Map<String, Object>>() { }); } public static