avro

How to prevent committing of an empty Avro file into HDFS?

风流意气都作罢 提交于 2020-01-06 09:02:59
问题 I have a job that create a Avro file into HDFS and append the file with data. However, occasionally there wont be any data for appending, in that case I don't want the application to flush and close the file, instead it should check whether the file is empty or not (but I assume thatthe Avro schema will be written into the header so technically not an empty file) and delete the file if it is empty. Is this feasible with Avro+HDFS lib? 回答1: Try using LazyOutputFormat when specifying the output

Avro file error while loading decimal field into Redshift table using Databricks

自作多情 提交于 2020-01-06 07:02:10
问题 I have a dataframe in Databricks, which has bunch of columns including a decimal(15,2) field. If I exclude the decimal field then I am able to insert this data into the Redshift table, but when decimal field is included then I get following error: "Cannot init avro reader from s3 file Cannot parse file header: Cannot save fixed schema" Any thoughts? 回答1: Try to use just decimal without range. Or cast existing column to decimal . Also try to use different tempformat . From my experience CSV

Error when querying avro-backed hive table: java.lang.IllegalArgumentException

杀马特。学长 韩版系。学妹 提交于 2020-01-06 02:52:11
问题 I am trying to create a hive table on azure HDInsight from an avro file exported from raw google analytics data in BigQuery. It seems to work. I can created the table, and there are no errors when I run DESCRIBE. But when I try to select results, even if I select only two non-nested columns, I get a an error: "java.lang.IllegalArgumentException". Here's how I created the table: DROP TABLE IF EXISTS ga_sessions_20150106; CREATE EXTERNAL TABLE IF NOT EXISTS ga_sessions_20150106 ROW FORMAT SERDE

Pig casting / datatypes

ぃ、小莉子 提交于 2020-01-04 08:15:22
问题 I'm trying to dump relation into AVRO file but I'm getting a strange error: org.apache.pig.data.DataByteArray cannot be cast to java.lang.CharSequence I don't use DataByteArray (bytearray), see description of the relation below. sensitiveSet: {rank_ID: long,name: chararray,customerId: long,VIN: chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption: chararray} Even when I do explicit casting I get the same error: sensitiveSet = foreach sensitiveSet generate (long) $0,

NoSuchMethodError writing Avro object to HDFS using Builder

怎甘沉沦 提交于 2020-01-03 10:19:32
问题 I'm getting this exception when writing an object to HDFS: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.avro.Schema$Parser.parse(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/avro/Schema; at com.blah.SomeType.<clinit>(SomeType.java:10) The line it is referencing in the generated code is this: public class SomeType extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { public static final org.apache.avro.Schema SCHEMA$

NoSuchMethodError writing Avro object to HDFS using Builder

[亡魂溺海] 提交于 2020-01-03 10:19:28
问题 I'm getting this exception when writing an object to HDFS: Exception in thread "main" java.lang.NoSuchMethodError: org.apache.avro.Schema$Parser.parse(Ljava/lang/String;[Ljava/lang/String;)Lorg/apache/avro/Schema; at com.blah.SomeType.<clinit>(SomeType.java:10) The line it is referencing in the generated code is this: public class SomeType extends org.apache.avro.specific.SpecificRecordBase implements org.apache.avro.specific.SpecificRecord { public static final org.apache.avro.Schema SCHEMA$

Start Confluent Schema Registry in windows

让人想犯罪 __ 提交于 2020-01-01 02:44:18
问题 I have windows environment and my own set of kafka and zookeeper running. To use custom objects, I started to use Avro. But I needed to get the registry started. Downloaded Confluent platform and ran this: $ ./bin/schema-registry-start ./etc/schema-registry/schema-registry.properties /c/Confluent/confluent-3.0.0-2.11/confluent-3.0.0/bin/schema-registry-run-class: line 103: C:\Program: No such file or directory Then I see this on the installation page: "Confluent does not currently support

Avro with Java 8 dates as logical type

徘徊边缘 提交于 2020-01-01 01:55:08
问题 Latest Avro compiler (1.8.2) generates java sources for dates logical types with Joda-Time based implementations. How can I configure Avro compiler to produce sources that used Java 8 date-time API? 回答1: Currently (avro 1.8.2) this is not possible. It's hardcoded to generate Joda date/time classes. The current master branch has switched to Java 8 and there is an open issue (with Pull Request) to add the ability to generate classes with java.time.* types. I have no idea on any kind of release

Error when reading avro files in python

笑着哭i 提交于 2019-12-31 01:55:15
问题 I installed Apache Avro successfully in Python. Then I try to read Avro files into Python following the instruction below. https://avro.apache.org/docs/1.8.1/gettingstartedpython.html I have a bunch of Avros in a directory which has already been set as the right path in Python. Here is my code: import avro.schema from avro.datafile import DataFileReader, DataFileWriter from avro.io import DatumReader, DatumWriter reader = DataFileReader(open("part-00000-of-01733.avro", "r"), DatumReader())

Avro schema doesn't honor backward compatibilty

橙三吉。 提交于 2019-12-30 17:36:09
问题 I have this avro schema { "namespace": "xx.xxxx.xxxxx.xxxxx", "type": "record", "name": "MyPayLoad", "fields": [ {"name": "filed1", "type": "string"}, {"name": "filed2", "type": "long"}, {"name": "filed3", "type": "boolean"}, { "name" : "metrics", "type": { "type" : "array", "items": { "name": "MyRecord", "type": "record", "fields" : [ {"name": "min", "type": "long"}, {"name": "max", "type": "long"}, {"name": "sum", "type": "long"}, {"name": "count", "type": "long"} ] } } } ] } Here is the