I have a text file on HDFS and I want to convert it to a Data Frame in Spark.
I am using the Spark Context to load the file and then try to generate individual columns f
A txt File with PIPE (|) delimited file can be read as :
df = spark.read.option("sep", "|").option("header", "true").csv("s3://bucket_name/folder_path/file_name.txt")
If you want to use the toDF
method, you have to convert your RDD
of Array[String]
into a RDD
of a case class. For example, you have to do:
case class Test(id:String,filed2:String)
val myFile = sc.textFile("file.txt")
val df= myFile.map( x => x.split(";") ).map( x=> Test(x(0),x(1)) ).toDF()