I have a dataframe \"df\" with the columns [\'name\', \'age\']
I saved the dataframe using df.rdd.saveAsTextFile(\"..\")
to save it as an rdd. I loaded the saved f
It is a normal RDD[Row]
. Problem is you that when you saveAsTextFile
and load with textFile
what you get is a bunch of strings. If you want to save objects you should use some form of serialization. For example pickleFile
:
from pyspark.sql import Row
df = sqlContext.createDataFrame(
[('Alice', 1), ('Alice', 2), ('Joe', 3)],
("name", "age")
)
df.rdd.map(tuple).saveAsPickleFile("foo")
sc.pickleFile("foo").collect()
## [('Joe', 3), ('Alice', 1), ('Alice', 2)]