How to bind variable in Apache Spark SQL? For example:
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql(\"SELECT * FROM src WHE
Try These
sqlContext.sql(s"SELECT * FROM src WHERE col1 = '${VAL1}'").collect().foreach(println)
Pyspark
sqlContext.sql("SELECT * FROM src WHERE col1 = {1} and col2 = {2}".format(VAL1,VAL2).collect().foreach(println)
I verified it in both Spark shell 2.x shell and Thrift(beeline) as well. I could able to bind a variable in Spark SQL query with set
command.
Query without bind variable:
select count(1) from mytable;
Query with bind variable (parameterized):
1. Spark SQL shell
set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl} select count(1) from ${key_tbl};
2. Spark shell
spark.sql("set key_tbl=mytable") spark.sql("select count(1) from ${key_tbl}").collect()
Both w/w.o bind params the query returns an identical result.
Note: Don't give any quotes to the value of key as it's table name here.
Let me know if there are any questions.
Spark SQL (as of 1.6 release) does not support bind variables.
ps. What Ashrith is suggesting is not a bind variable.. You're constructing a string every time. Every time Spark will parse the query, create execution plan etc. Purpose of bind variables (in RDBMS systems for example) is to cut time on creating execution plan (which can be costly where there are a lot of joins etc). Spark has to have a special API to "parse" a query and then to "bind" variables. Spark does not have this functionality (as of today, Spark 1.6 release).
Update 8/2018: as of Spark 2.3 there are (still) no bind variables in Spark.