avro | 易学教程

How to prevent committing of an empty Avro file into HDFS?

阅读更多关于 How to prevent committing of an empty Avro file into HDFS?

问题 I have a job that create a Avro file into HDFS and append the file with data. However, occasionally there wont be any data for appending, in that case I don't want the application to flush and close the file, instead it should check whether the file is empty or not (but I assume thatthe Avro schema will be written into the header so technically not an empty file) and delete the file if it is empty. Is this feasible with Avro+HDFS lib? 回答1: Try using LazyOutputFormat when specifying the output

Avro file error while loading decimal field into Redshift table using Databricks

阅读更多关于 Avro file error while loading decimal field into Redshift table using Databricks

问题 I have a dataframe in Databricks, which has bunch of columns including a decimal(15,2) field. If I exclude the decimal field then I am able to insert this data into the Redshift table, but when decimal field is included then I get following error: "Cannot init avro reader from s3 file Cannot parse file header: Cannot save fixed schema" Any thoughts? 回答1: Try to use just decimal without range. Or cast existing column to decimal . Also try to use different tempformat . From my experience CSV

Error when querying avro-backed hive table: java.lang.IllegalArgumentException

阅读更多关于 Error when querying avro-backed hive table: java.lang.IllegalArgumentException

问题 I am trying to create a hive table on azure HDInsight from an avro file exported from raw google analytics data in BigQuery. It seems to work. I can created the table, and there are no errors when I run DESCRIBE. But when I try to select results, even if I select only two non-nested columns, I get a an error: "java.lang.IllegalArgumentException". Here's how I created the table: DROP TABLE IF EXISTS ga_sessions_20150106; CREATE EXTERNAL TABLE IF NOT EXISTS ga_sessions_20150106 ROW FORMAT SERDE

Pig casting / datatypes

阅读更多关于 Pig casting / datatypes

问题 I'm trying to dump relation into AVRO file but I'm getting a strange error: org.apache.pig.data.DataByteArray cannot be cast to java.lang.CharSequence I don't use DataByteArray (bytearray), see description of the relation below. sensitiveSet: {rank_ID: long,name: chararray,customerId: long,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray} Even when I do explicit casting I get the same error: sensitiveSet = foreach sensitiveSet generate (long) $0,

NoSuchMethodError writing Avro object to HDFS using Builder

阅读更多关于 NoSuchMethodError writing Avro object to HDFS using Builder

问题 I'm getting this exception when writing an object to HDFS: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.avro.Schema$Parser.parse(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/avro/Schema; at com.blah.SomeType.<clinit>(SomeType.java:10) The line it is referencing in the generated code is this: public class SomeType extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { public static final org.apache.avro.Schema SCHEMA$

NoSuchMethodError writing Avro object to HDFS using Builder

阅读更多关于 NoSuchMethodError writing Avro object to HDFS using Builder

Start Confluent Schema Registry in windows

阅读更多关于 Start Confluent Schema Registry in windows

问题 I have windows environment and my own set of kafka and zookeeper running. To use custom objects, I started to use Avro. But I needed to get the registry started. Downloaded Confluent platform and ran this: $ ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties /c/Confluent/confluent-3.0.0-2.11/confluent-3.0.0/bin/schema-registry-run-class: line 103: C:\Program: No such file or directory Then I see this on the installation page: "Confluent does not currently support

Avro with Java 8 dates as logical type

阅读更多关于 Avro with Java 8 dates as logical type

问题 Latest Avro compiler (1.8.2) generates java sources for dates logical types with Joda-Time based implementations. How can I configure Avro compiler to produce sources that used Java 8 date-time API? 回答1: Currently (avro 1.8.2) this is not possible. It's hardcoded to generate Joda date/time classes. The current master branch has switched to Java 8 and there is an open issue (with Pull Request) to add the ability to generate classes with java.time.* types. I have no idea on any kind of release

Error when reading avro files in python

阅读更多关于 Error when reading avro files in python

问题 I installed Apache Avro successfully in Python. Then I try to read Avro files into Python following the instruction below. https://avro.apache.org/docs/1.8.1/gettingstartedpython.html I have a bunch of Avros in a directory which has already been set as the right path in Python. Here is my code: import avro.schema from avro.datafile import DataFileReader, DataFileWriter from avro.io import DatumReader, DatumWriter reader = DataFileReader(open("part-00000-of-01733.avro", "r"), DatumReader())

Avro schema doesn't honor backward compatibilty

阅读更多关于 Avro schema doesn't honor backward compatibilty

问题 I have this avro schema { "namespace": "xx.xxxx.xxxxx.xxxxx", "type": "record", "name": "MyPayLoad", "fields": [ {"name": "filed1", "type": "string"}, {"name": "filed2", "type": "long"}, {"name": "filed3", "type": "boolean"}, { "name" : "metrics", "type": { "type" : "array", "items": { "name": "MyRecord", "type": "record", "fields" : [ {"name": "min", "type": "long"}, {"name": "max", "type": "long"}, {"name": "sum", "type": "long"}, {"name": "count", "type": "long"} ] } } } ] } Here is the