How to create an empty DataFrame? Why “ValueError: RDD is empty”?

后端未结

关注

 11  1135

孤城傲影

I am trying to create an empty dataframe in Spark (Pyspark).

I am using similar approach to the one discussed here enter link description here, but it is not working.

相关标签:

11条回答

再見小時候

2021-02-01 04:24
You can create an empty data frame by using following syntax in pyspark:
```
df = spark.createDataFrame([], ["col1", "col2", ...])
```
where [] represents the empty value for col1 and col2. Then you can register as temp view for your sql queries:
```
**df2.createOrReplaceTempView("artist")**
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
太阳男子

2021-02-01 04:26
```
Seq.empty[String].toDF()
```
This will create a empty df. Helpful for testing purposes and all. (Scala-Spark)
0 讨论(0)
发布评论:

提交评论
- 加载中...

春和景丽

2021-02-01 04:27

This is a roundabout but simple way to create an empty spark df with an inferred schema

# Initialize a spark df using one row of data with the desired schema   
init_sdf = spark.createDataFrame([('a_string', 0, 0)], ['name', 'index', 'seq_#'])
# remove the row.  Leaves the schema
empty_sdf = init_sdf.where(col('name') == 'not_match')  
empty_sdf.printSchema()
# Output
root
 |-- name: string (nullable = true)
 |-- index: long (nullable = true)
 |-- seq_#: long (nullable = true)

0 讨论(0)

孤城傲影

2021-02-01 04:28
You can do it by loading an empty file (parquet, json etc.) like this:
```
df = sqlContext.read.json("my_empty_file.json")
```
Then when you try to check the schema you'll see:
```
>>> df.printSchema()
root
```
In Scala/Java not passing a path should work too, in Python it throws an exception. Also if you ever switch to Scala/Python you can use this method to create one.
0 讨论(0)
发布评论:

提交评论
- 加载中...
时光取名叫无心

2021-02-01 04:30
extending Joe Widen's answer, you can actually create the schema with no fields like so:
```
schema = StructType([])
```
so when you create the DataFrame using that as your schema, you'll end up with a DataFrame[].
```
>>> empty = sqlContext.createDataFrame(sc.emptyRDD(), schema)
DataFrame[]
>>> empty.schema
StructType(List())
```
In Scala, if you choose to use sqlContext.emptyDataFrame and check out the schema, it will return StructType().
```
scala> val empty = sqlContext.emptyDataFrame
empty: org.apache.spark.sql.DataFrame = []

scala> empty.schema
res2: org.apache.spark.sql.types.StructType = StructType()    
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

上一页 1 2