How can I load Avros in Spark using the schema on-board the Avro file(s)?

后端 未结 2 2029
不思量自难忘°
不思量自难忘° 2021-02-02 02:10

I am running CDH 4.4 with Spark 0.9.0 from a Cloudera parcel.

I have a bunch of Avro files that were created via Pig\'s AvroStorage UDF. I want to load these files in Sp

2条回答
  •  时光取名叫无心
    2021-02-02 02:47

    This works for me:

    import org.apache.avro.generic.GenericRecord
    import org.apache.avro.mapred.{AvroInputFormat, AvroWrapper}
    import org.apache.hadoop.io.NullWritable
    
    ...
    val path = "hdfs:///path/to/your/avro/folder"
    val avroRDD = sc.hadoopFile[AvroWrapper[GenericRecord], NullWritable, AvroInputFormat[GenericRecord]](path)
    

提交回复
热议问题