How to create a DataFrame from a text file in Spark

后端 未结 8 1046
滥情空心
滥情空心 2021-01-31 19:03

I have a text file on HDFS and I want to convert it to a Data Frame in Spark.

I am using the Spark Context to load the file and then try to generate individual columns f

8条回答
  •  小鲜肉
    小鲜肉 (楼主)
    2021-01-31 19:32

    You can read a file to have an RDD and then assign schema to it. Two common ways to creating schema are either using a case class or a Schema object [my preferred one]. Follows the quick snippets of code that you may use.

    Case Class approach

    case class Test(id:String,name:String)
    val myFile = sc.textFile("file.txt")
    val df= myFile.map( x => x.split(";") ).map( x=> Test(x(0),x(1)) ).toDF()
    

    Schema Approach

    import org.apache.spark.sql.types._
    val schemaString = "id name"
    val fields = schemaString.split(" ").map(fieldName => StructField(fieldName, StringType, nullable=true))
    val schema = StructType(fields)
    
    val dfWithSchema = sparkSess.read.option("header","false").schema(schema).csv("file.txt")
    dfWithSchema.show()
    

    The second one is my preferred approach since case class has a limitation of max 22 fields and this will be a problem if your file has more than 22 fields!

提交回复
热议问题