How could I order by sum, within a DataFrame in PySpark?

后端 未结 1 1024
你的背包
你的背包 2021-01-14 05:06

Analogously to:

order_items.groupBy(\"order_item_order_id\").count().orderBy(desc(\"count\")).show()

I have tried:

order_it         


        
相关标签:
1条回答
  • 2021-01-14 05:32

    You should use aliases for your columns:

    import pyspark.sql.functions as func
    
    order_items.groupBy("order_item_order_id")\
               .agg(func.sum("order_item_subtotal")\
                    .alias("sum_column_name"))\
               .orderBy("sum_column_name")
    
    0 讨论(0)
提交回复
热议问题