发表新帖

发表新帖

How to create a DataFrame from a text file in Spark

后端未结

关注

 8  1062

滥情空心 2021-01-31 19:03

I have a text file on HDFS and I want to convert it to a Data Frame in Spark.

I am using the Spark Context to load the file and then try to generate individual columns f

8条回答

难免孤独 (楼主)

2021-01-31 19:45
Update - as of Spark 1.6, you can simply use the built-in csv data source:
```
spark: SparkSession = // create the Spark Session
val df = spark.read.csv("file.txt")
```
You can also use various options to control the CSV parsing, e.g.:
```
val df = spark.read.option("header", "false").csv("file.txt")
```
For Spark version < 1.6: The easiest way is to use spark-csv - include it in your dependencies and follow the README, it allows setting a custom delimiter (;), can read CSV headers (if you have them), and it can infer the schema types (with the cost of an extra scan of the data).

Alternatively, if you know the schema you can create a case-class that represents it and map your RDD elements into instances of this class before transforming into a DataFrame, e.g.:
```
case class Record(id: Int, name: String)

val myFile1 = myFile.map(x=>x.split(";")).map {
  case Array(id, name) => Record(id.toInt, name)
} 

myFile1.toDF() // DataFrame will have columns "id" and "name"
```
0 讨论(0)

查看其它8个回答
发布评论:

提交评论
- 加载中...

热议问题