How to save a spark dataframe as a text file without Rows in pyspark?

后端 未结 1 888
广开言路
广开言路 2021-01-25 04:51

I have a dataframe \"df\" with the columns [\'name\', \'age\'] I saved the dataframe using df.rdd.saveAsTextFile(\"..\") to save it as an rdd. I loaded the saved f

相关标签:
1条回答
  • 2021-01-25 05:09

    It is a normal RDD[Row]. Problem is you that when you saveAsTextFile and load with textFile what you get is a bunch of strings. If you want to save objects you should use some form of serialization. For example pickleFile:

    from pyspark.sql import Row
    
    df = sqlContext.createDataFrame(
       [('Alice', 1), ('Alice', 2), ('Joe', 3)],
       ("name", "age")
    )
    
    df.rdd.map(tuple).saveAsPickleFile("foo")
    sc.pickleFile("foo").collect()
    
    ## [('Joe', 3), ('Alice', 1), ('Alice', 2)]
    
    0 讨论(0)
提交回复
热议问题