How to create an empty DataFrame? Why “ValueError: RDD is empty”?

后端 未结 11 1087
孤城傲影
孤城傲影 2021-02-01 03:48

I am trying to create an empty dataframe in Spark (Pyspark).

I am using similar approach to the one discussed here enter link description here, but it is not working.

相关标签:
11条回答
  • 2021-02-01 04:24

    You can create an empty data frame by using following syntax in pyspark:

    df = spark.createDataFrame([], ["col1", "col2", ...])
    

    where [] represents the empty value for col1 and col2. Then you can register as temp view for your sql queries:

    **df2.createOrReplaceTempView("artist")**
    
    0 讨论(0)
  • 2021-02-01 04:26
    Seq.empty[String].toDF()
    

    This will create a empty df. Helpful for testing purposes and all. (Scala-Spark)

    0 讨论(0)
  • 2021-02-01 04:27

    This is a roundabout but simple way to create an empty spark df with an inferred schema

    # Initialize a spark df using one row of data with the desired schema   
    init_sdf = spark.createDataFrame([('a_string', 0, 0)], ['name', 'index', 'seq_#'])
    # remove the row.  Leaves the schema
    empty_sdf = init_sdf.where(col('name') == 'not_match')  
    empty_sdf.printSchema()
    # Output
    root
     |-- name: string (nullable = true)
     |-- index: long (nullable = true)
     |-- seq_#: long (nullable = true)
    
    0 讨论(0)
  • 2021-02-01 04:28

    You can do it by loading an empty file (parquet, json etc.) like this:

    df = sqlContext.read.json("my_empty_file.json")
    

    Then when you try to check the schema you'll see:

    >>> df.printSchema()
    root
    

    In Scala/Java not passing a path should work too, in Python it throws an exception. Also if you ever switch to Scala/Python you can use this method to create one.

    0 讨论(0)
  • 2021-02-01 04:30

    extending Joe Widen's answer, you can actually create the schema with no fields like so:

    schema = StructType([])
    

    so when you create the DataFrame using that as your schema, you'll end up with a DataFrame[].

    >>> empty = sqlContext.createDataFrame(sc.emptyRDD(), schema)
    DataFrame[]
    >>> empty.schema
    StructType(List())
    

    In Scala, if you choose to use sqlContext.emptyDataFrame and check out the schema, it will return StructType().

    scala> val empty = sqlContext.emptyDataFrame
    empty: org.apache.spark.sql.DataFrame = []
    
    scala> empty.schema
    res2: org.apache.spark.sql.types.StructType = StructType()    
    
    0 讨论(0)
提交回复
热议问题