How to create an empty DataFrame? Why “ValueError: RDD is empty”?

后端未结

关注

 11  1133

孤城傲影

I am trying to create an empty dataframe in Spark (Pyspark).

I am using similar approach to the one discussed here enter link description here, but it is not working.

相关标签:

11条回答

栀梦

2021-02-01 04:08

This will work with spark version 2.0.0 or more

from pyspark.sql import SQLContext
sc = spark.sparkContext
schema = StructType([StructField('col1', StringType(), False),StructField('col2', IntegerType(), True)])
sqlContext.createDataFrame(sc.emptyRDD(), schema)

0 讨论(0)

故里飘歌

2021-02-01 04:11
If you want an empty dataframe based on an existing one, simple limit rows to 0. In PySpark :
```
emptyDf = existingDf.limit(0)
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

渐次进展

2021-02-01 04:11

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType,StructField, StringType

spark = SparkSession.builder.appName('SparkPractice').getOrCreate()

schema = StructType([
  StructField('firstname', StringType(), True),
  StructField('middlename', StringType(), True),
  StructField('lastname', StringType(), True)
  ])

df = spark.createDataFrame(spark.sparkContext.emptyRDD(),schema)
df.printSchema()

0 讨论(0)

后悔当初

2021-02-01 04:13
```
spark.range(0).drop("id")
```
This creates a DataFrame with an "id" column and no rows then drops the "id" column, leaving you with a truly empty DataFrame.
0 讨论(0)
发布评论:

提交评论
- 加载中...

北恋

2021-02-01 04:19

At the time this answer was written it looks like you need some sort of schema

from pyspark.sql.types import *
field = [StructField("field1", StringType(), True)]
schema = StructType(field)

sc = spark.sparkContext
sqlContext.createDataFrame(sc.emptyRDD(), schema)

0 讨论(0)

抹茶落季

2021-02-01 04:22
You can just use something like this:
```
   pivot_table = sparkSession.createDataFrame([("99","99")], ["col1","col2"])
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页