Iterate across columns in spark dataframe and calculate min max value

痴心易碎 提交于 2019-12-12 04:12:15

问题


I want to iterate across the columns of dataframe in my Spark program and calculate min and max value. I'm new to Spark and scala and not able to iterate over the columns once I fetch it in a dataframe.

I have tried running the below code but it needs column number to be passed to it, question is how do I fetch it from dataframe and pass it dynamically and store the result in a collection.

val parquetRDD = spark.read.parquet("filename.parquet")

parquetRDD.collect.foreach ({ i => parquetRDD_subset.agg(max(parquetRDD(parquetRDD.columns(2))), min(parquetRDD(parquetRDD.columns(2)))).show()})

Appreciate any help on this.


回答1:


You should not be iterating on rows or records. You should be using aggregation function

import org.apache.spark.sql.functions._
val df = spark.read.parquet("filename.parquet")
val aggCol = col(df.columns(2))
df.agg(min(aggCol), max(aggCol)).show()

First when you do spark.read.parquet you are reading a dataframe. Next we define the column we want to work on using the col function. The col function translate a column name to a column. You could instead use df("name") where name is the name of the column.

The agg function takes aggregation columns so min and max are aggregation functions which take a column and return a column with an aggregated value.

Update

According to the comments, the goal is to have min and max for all columns. You can therefore do this:

val minColumns = df.columns.map(name => min(col(name)))
val maxColumns = df.columns.map(name => max(col(name)))
val allMinMax = minColumns ++ maxColumns
df.agg(allMinMax.head, allMinMax.tail: _*).show()

You can also simply do:

df.describe().show()

which gives you statistics on all columns including min, max, avg, count and stddev



来源:https://stackoverflow.com/questions/45171920/iterate-across-columns-in-spark-dataframe-and-calculate-min-max-value

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!