问题
I'm running below spark SQL query in Intellij Maven IDE,
SELECT seq_no,
amount,
(select max(b.amount)
from premium b
where b.seq_no < a.seq_no) last_high_prem
FROM premium a
Got the below error,
Exception in thread "main" org.apache.spark.sql.AnalysisException: The correlated scalar subquery can only contain equality predicates: (seq_no#11#32 < seq_no#11);
I understand correlated query when uses equality operator then only works currently in spark SQL. Is there any method to overcome this issue.
I know we can do it hiveql. But need to setup hadoop and hive in my local machine. Please let me know how to mitigate the issue.
回答1:
I know next to nothing about Spark SQL, but it seems to me your issue is with the correlated subquery, which wouldn't be necessary for this query in most brands of SQL. Spark does accept the max
function as a Window Function.
Can you do:
SELECT seq_no,
amount,
max(amount) OVER (ORDER BY seq_no ROWS BETWEEN UNBOUNDED PRECEDING and 1 PRECEDING) AS last_high_prem
FROM premium
Note: you probably also need a partition by
phrase, but not for the exact query you've presented.
来源:https://stackoverflow.com/questions/52256127/spark-sql-error-when-running-correlated-subquery