spark sql error when running correlated subquery

问题

I'm running below spark SQL query in Intellij Maven IDE,

SELECT seq_no, 
       amount, 
       (select max(b.amount) 
        from premium b 
        where b.seq_no < a.seq_no) last_high_prem
FROM premium a

Got the below error,

Exception in thread "main" org.apache.spark.sql.AnalysisException: The correlated scalar subquery can only contain equality predicates: (seq_no#11#32 < seq_no#11);

I understand correlated query when uses equality operator then only works currently in spark SQL. Is there any method to overcome this issue.

I know we can do it hiveql. But need to setup hadoop and hive in my local machine. Please let me know how to mitigate the issue.

回答1:

I know next to nothing about Spark SQL, but it seems to me your issue is with the correlated subquery, which wouldn't be necessary for this query in most brands of SQL. Spark does accept the max function as a Window Function.

Can you do:

SELECT seq_no, 
       amount, 
       max(amount) OVER (ORDER BY seq_no ROWS BETWEEN UNBOUNDED PRECEDING and 1 PRECEDING) AS last_high_prem
FROM premium

Note: you probably also need a partition by phrase, but not for the exact query you've presented.

来源：https://stackoverflow.com/questions/52256127/spark-sql-error-when-running-correlated-subquery

标签

sql

apache-spark

apache-spark-sql

correlated-subquery

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!