Best way to get the max value in a Spark dataframe column

后端 未结 13 917
一整个雨季
一整个雨季 2020-12-07 10:27

I\'m trying to figure out the best way to get the largest value in a Spark dataframe column.

Consider the following example:

df = spark.createDataFra         


        
相关标签:
13条回答
  • 2020-12-07 10:54

    Another way of doing it:

    df.select(f.max(f.col("A")).alias("MAX")).limit(1).collect()[0].MAX
    

    On my data, I got this benchmarks:

    df.select(f.max(f.col("A")).alias("MAX")).limit(1).collect()[0].MAX
    CPU times: user 2.31 ms, sys: 3.31 ms, total: 5.62 ms
    Wall time: 3.7 s
    
    df.select("A").rdd.max()[0]
    CPU times: user 23.2 ms, sys: 13.9 ms, total: 37.1 ms
    Wall time: 10.3 s
    
    df.agg({"A": "max"}).collect()[0][0]
    CPU times: user 0 ns, sys: 4.77 ms, total: 4.77 ms
    Wall time: 3.75 s
    

    All of them give the same answer

    0 讨论(0)
提交回复
热议问题