I am trying to create an empty dataframe in Spark (Pyspark).
I am using similar approach to the one discussed here enter link description here, but it is not working.
You can do it by loading an empty file (parquet
, json
etc.) like this:
df = sqlContext.read.json("my_empty_file.json")
Then when you try to check the schema you'll see:
>>> df.printSchema()
root
In Scala/Java not passing a path should work too, in Python it throws an exception. Also if you ever switch to Scala/Python you can use this method to create one.