How to create an empty DataFrame? Why “ValueError: RDD is empty”?

后端 未结 11 1084
孤城傲影
孤城傲影 2021-02-01 03:48

I am trying to create an empty dataframe in Spark (Pyspark).

I am using similar approach to the one discussed here enter link description here, but it is not working.

相关标签:
11条回答
  • 2021-02-01 04:08

    This will work with spark version 2.0.0 or more

    from pyspark.sql import SQLContext
    sc = spark.sparkContext
    schema = StructType([StructField('col1', StringType(), False),StructField('col2', IntegerType(), True)])
    sqlContext.createDataFrame(sc.emptyRDD(), schema)
    
    0 讨论(0)
  • 2021-02-01 04:11

    If you want an empty dataframe based on an existing one, simple limit rows to 0. In PySpark :

    emptyDf = existingDf.limit(0)
    
    0 讨论(0)
  • 2021-02-01 04:11
    import pyspark
    from pyspark.sql import SparkSession
    from pyspark.sql.types import StructType,StructField, StringType
    
    spark = SparkSession.builder.appName('SparkPractice').getOrCreate()
    
    schema = StructType([
      StructField('firstname', StringType(), True),
      StructField('middlename', StringType(), True),
      StructField('lastname', StringType(), True)
      ])
    
    df = spark.createDataFrame(spark.sparkContext.emptyRDD(),schema)
    df.printSchema()
    
    0 讨论(0)
  • 2021-02-01 04:13
    spark.range(0).drop("id")
    

    This creates a DataFrame with an "id" column and no rows then drops the "id" column, leaving you with a truly empty DataFrame.

    0 讨论(0)
  • 2021-02-01 04:19

    At the time this answer was written it looks like you need some sort of schema

    from pyspark.sql.types import *
    field = [StructField("field1", StringType(), True)]
    schema = StructType(field)
    
    sc = spark.sparkContext
    sqlContext.createDataFrame(sc.emptyRDD(), schema)
    
    0 讨论(0)
  • 2021-02-01 04:22

    You can just use something like this:

       pivot_table = sparkSession.createDataFrame([("99","99")], ["col1","col2"])
    
    0 讨论(0)
提交回复
热议问题