How to retrieve all columns using pyspark collect_list functions

前端 未结 3 1120
梦如初夏
梦如初夏 2021-01-14 05:42

I have a pyspark 2.0.1. I\'m trying to groupby my data frame & retrieve the value for all the fields from my data frame. I found that

z=data1.groupby(\'         


        
3条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-14 06:13

    in spark 2.4.4 and python 3.7 (I guess its also relevant for previous spark and python version) --
    My suggestion is a based on pauli's answer,
    instead of creating the struct and then using the agg function, create the struct inside collect_list:

    df = spark.createDataFrame([(0,1,2),(0,4,5),(1,7,8),(1,8,7)]).toDF("a","b","c")
    df.groupBy("a").agg(collect_list(struct(["b","c"])).alias("res")).show()
    

    result :

    +---+-----------------+
    |  a|res              |
    +---+-----------------+
    |  0|[[1, 2], [4, 5]] |
    |  1|[[7, 8], [8, 7]] |
    +---+-----------------+
    

提交回复
热议问题