Issue with approxQuantile of spark , not recognizing List<String>

喜你入骨 提交于 2020-03-12 05:32:48

问题


I am using spark-sql-2.4.1v in my project with java8.

I need to calculate the quantiles on the some of the (calculated) columns (i.e. con_dist_1, con_dist_2) of below given dataframe df:

+----+---------+-------------+----------+-----------+
|  id|     date|   revenue   |con_dist_1| con_dist_2|
+----+---------+-------------+----------+-----------+
|  10|1/15/2018|  0.010680705|         6|0.019875458|
|  10|1/15/2018|  0.006628853|         4|0.816039063|
|  10|1/15/2018|   0.01378215|         4|0.082049528|
|  10|1/15/2018|  0.010680705|         6|0.019875458|
|  10|1/15/2018|  0.006628853|         4|0.816039063|
|  10|1/15/2018|   0.01378215|         4|0.082049528|
|  10|1/15/2018|  0.010680705|         6|0.019875458|
|  10|1/15/2018|  0.010680705|         6|0.019875458|
|  10|1/15/2018|  0.014933087|         5|0.034681906|
|  10|1/15/2018|  0.014448282|         3|0.082049528|
+----+---------+-------------+----------+-----------+

List<String> calcColmns = Arrays.asList("con_dist_1","con_dist_2")

When I am trying to use first version of approxQuantile, i.e. approxQuantile(List<String>, List<Double>, double) as below

List<List<Double>> quants = df.stat().approxQuantile(calcColmns , Array(0.0,0.1,0.5),0.0);

It is giving the error:

The method approxQuantile(String, double[], double) in the type DataFrameStatFunctions is not applicable for the arguments (List, List, double)

What is wrong here? I'm doing it in my eclipseIDE. Why it is not invoking List<String> even though I'm passing List<String>?

Really appreciate any help on this.

Added snapshot of the API:


回答1:


It looks like it could be due to the use of Array in the inputs to the approxQuantile function. The simplest fix would be to use arrays for both the columns and the percentiles (this would use the third approxQuantile method in the API snapshot.:

String[] calcColmns = {"con_dist_1", "con_dist_2"};
double[] percentiles = {0.0,0.1,0.5};

And then call the function:

double[][] quants = df.stat().approxQuantile(calcColmns, percentiles, 0.0);


来源:https://stackoverflow.com/questions/60550152/issue-with-approxquantile-of-spark-not-recognizing-liststring

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!