How to create a DataFrame from a text file in Spark

后端 未结 8 1062
滥情空心
滥情空心 2021-01-31 19:03

I have a text file on HDFS and I want to convert it to a Data Frame in Spark.

I am using the Spark Context to load the file and then try to generate individual columns f

8条回答
  •  难免孤独
    2021-01-31 19:45

    Update - as of Spark 1.6, you can simply use the built-in csv data source:

    spark: SparkSession = // create the Spark Session
    val df = spark.read.csv("file.txt")
    

    You can also use various options to control the CSV parsing, e.g.:

    val df = spark.read.option("header", "false").csv("file.txt")
    

    For Spark version < 1.6: The easiest way is to use spark-csv - include it in your dependencies and follow the README, it allows setting a custom delimiter (;), can read CSV headers (if you have them), and it can infer the schema types (with the cost of an extra scan of the data).

    Alternatively, if you know the schema you can create a case-class that represents it and map your RDD elements into instances of this class before transforming into a DataFrame, e.g.:

    case class Record(id: Int, name: String)
    
    val myFile1 = myFile.map(x=>x.split(";")).map {
      case Array(id, name) => Record(id.toInt, name)
    } 
    
    myFile1.toDF() // DataFrame will have columns "id" and "name"
    

提交回复
热议问题