dynamically bind variable/parameter in Spark SQL?

回眸只為那壹抹淺笑 提交于 2019-12-04 18:07:39

问题


How to bind variable in Apache Spark SQL? For example:

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("SELECT * FROM src WHERE col1 = ${VAL1}").collect().foreach(println)

回答1:


Spark SQL (as of 1.6 release) does not support bind variables.

ps. What Ashrith is suggesting is not a bind variable.. You're constructing a string every time. Evey time Spark will parse the query, create execution plan etc. Purpose of bind variables (in RDBMS systems for example) is to cut time on creating execution plan (which can be costly where there are a lot of joins etc). Spark has to have a special API to "parse" a query and then to "bind" variables. Spark does not have this functionality (as of today, Spark 1.6 release).

Update 8/2018: as of Spark 2.3 there are (still) no bind variables in Spark.




回答2:


I verified it in both Spark shell 2.x shell and Thrift(beeline) as well. I could able to bind a variable in Spark SQL query with set command.

Query without bind variable:

select count(1) from mytable;

Query with bind variable (parameterized):

1. Spark SQL shell

 set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl}
 select count(1) from ${key_tbl};

2. Spark shell

spark.sql("set key_tbl=mytable")
spark.sql("select count(1) from ${key_tbl}").collect()

Both w/w.o bind params the query returns an identical result.

Note: Don't give any quotes to the value of key as it's table name here.

Let me know if there are any questions.




回答3:


Pyspark

sqlContext.sql("SELECT * FROM src WHERE col1 = {1} and col2 = {2}".format(VAL1,VAL2).collect().foreach(println)



回答4:


Try These

sqlContext.sql(s"SELECT * FROM src WHERE col1 = '${VAL1}'").collect().foreach(println)


来源:https://stackoverflow.com/questions/26755230/dynamically-bind-variable-parameter-in-spark-sql

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!