How to create an empty DataFrame? Why “ValueError: RDD is empty”?

后端未结

关注

 11  1139

孤城傲影 2021-02-01 03:48

I am trying to create an empty dataframe in Spark (Pyspark).

I am using similar approach to the one discussed here enter link description here, but it is not working.

11条回答

春和景丽 (楼主)

2021-02-01 04:27

This is a roundabout but simple way to create an empty spark df with an inferred schema

# Initialize a spark df using one row of data with the desired schema   
init_sdf = spark.createDataFrame([('a_string', 0, 0)], ['name', 'index', 'seq_#'])
# remove the row.  Leaves the schema
empty_sdf = init_sdf.where(col('name') == 'not_match')  
empty_sdf.printSchema()
# Output
root
 |-- name: string (nullable = true)
 |-- index: long (nullable = true)
 |-- seq_#: long (nullable = true)

0 讨论(0)

查看其它11个回答