问题
I am using spark-sql-2.4.1v in my project with java8.
I need to calculate the quantiles on the some of the (calculated) columns (i.e. con_dist_1
, con_dist_2
) of below given dataframe df
:
+----+---------+-------------+----------+-----------+
| id| date| revenue |con_dist_1| con_dist_2|
+----+---------+-------------+----------+-----------+
| 10|1/15/2018| 0.010680705| 6|0.019875458|
| 10|1/15/2018| 0.006628853| 4|0.816039063|
| 10|1/15/2018| 0.01378215| 4|0.082049528|
| 10|1/15/2018| 0.010680705| 6|0.019875458|
| 10|1/15/2018| 0.006628853| 4|0.816039063|
| 10|1/15/2018| 0.01378215| 4|0.082049528|
| 10|1/15/2018| 0.010680705| 6|0.019875458|
| 10|1/15/2018| 0.010680705| 6|0.019875458|
| 10|1/15/2018| 0.014933087| 5|0.034681906|
| 10|1/15/2018| 0.014448282| 3|0.082049528|
+----+---------+-------------+----------+-----------+
List<String> calcColmns = Arrays.asList("con_dist_1","con_dist_2")
When I am trying to use first version of approxQuantile
, i.e. approxQuantile(List<String>, List<Double>, double)
as below
List<List<Double>> quants = df.stat().approxQuantile(calcColmns , Array(0.0,0.1,0.5),0.0);
It is giving the error:
The method approxQuantile(String, double[], double) in the type DataFrameStatFunctions is not applicable for the arguments (List, List, double)
What is wrong here? I'm doing it in my eclipseIDE. Why it is not invoking List<String>
even though I'm passing List<String>
?
Really appreciate any help on this.
Added snapshot of the API:
回答1:
It looks like it could be due to the use of Array
in the inputs to the approxQuantile
function. The simplest fix would be to use arrays for both the columns and the percentiles (this would use the third approxQuantile
method in the API snapshot.:
String[] calcColmns = {"con_dist_1", "con_dist_2"};
double[] percentiles = {0.0,0.1,0.5};
And then call the function:
double[][] quants = df.stat().approxQuantile(calcColmns, percentiles, 0.0);
来源:https://stackoverflow.com/questions/60550152/issue-with-approxquantile-of-spark-not-recognizing-liststring