I am running CDH 4.4 with Spark 0.9.0 from a Cloudera parcel.
I have a bunch of Avro files that were created via Pig\'s AvroStorage UDF. I want to load these files in Sp
This works for me:
import org.apache.avro.generic.GenericRecord
import org.apache.avro.mapred.{AvroInputFormat, AvroWrapper}
import org.apache.hadoop.io.NullWritable
...
val path = "hdfs:///path/to/your/avro/folder"
val avroRDD = sc.hadoopFile[AvroWrapper[GenericRecord], NullWritable, AvroInputFormat[GenericRecord]](path)