Dataframe transpose with pyspark in Apache Spark

前端 未结 1 761
一整个雨季
一整个雨季 2020-12-29 12:40

I have a dataframe df that have following structure:

+-----+-----+-----+-------+
|  s  |col_1|col_2|col_...|
+-----+-----+-----+-------+
| f1  |         


        
相关标签:
1条回答
  • 2020-12-29 13:14

    If data is small enough to be transposed (not pivoted with aggregation) you can just convert it to Pandas DataFrame:

    df = sc.parallelize([
        ("f1", 0.0, 0.6, 0.5),
        ("f2", 0.6, 0.7, 0.9)]).toDF(["s", "col_1", "col_2", "col_3"])
    
    df.toPandas().set_index("s").transpose()
    s       f1   f2
    col_1  0.0  0.6
    col_2  0.6  0.7
    col_3  0.5  0.9
    

    If it is to large for this, Spark won't help. Spark DataFrame distributes data by row (although locally uses columnar storage), therefore size of a individual rows is limited to local memory.

    0 讨论(0)
提交回复
热议问题