dynamically bind variable/parameter in Spark SQL?

后端 未结 4 705
情深已故
情深已故 2020-12-31 05:38

How to bind variable in Apache Spark SQL? For example:

val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql(\"SELECT * FROM src WHE         


        
相关标签:
4条回答
  • 2020-12-31 05:54

    Try These

    sqlContext.sql(s"SELECT * FROM src WHERE col1 = '${VAL1}'").collect().foreach(println)
    
    0 讨论(0)
  • 2020-12-31 05:55

    Pyspark

    sqlContext.sql("SELECT * FROM src WHERE col1 = {1} and col2 = {2}".format(VAL1,VAL2).collect().foreach(println)
    
    0 讨论(0)
  • 2020-12-31 05:59

    I verified it in both Spark shell 2.x shell and Thrift(beeline) as well. I could able to bind a variable in Spark SQL query with set command.

    Query without bind variable:

    select count(1) from mytable; 
    

    Query with bind variable (parameterized):

    1. Spark SQL shell

     set key_tbl=mytable; -- setting mytable to key_tbl to use as ${key_tbl}
     select count(1) from ${key_tbl};
    

    2. Spark shell

    spark.sql("set key_tbl=mytable")
    spark.sql("select count(1) from ${key_tbl}").collect()
    

    Both w/w.o bind params the query returns an identical result.

    Note: Don't give any quotes to the value of key as it's table name here.

    Let me know if there are any questions.

    0 讨论(0)
  • 2020-12-31 06:15

    Spark SQL (as of 1.6 release) does not support bind variables.

    ps. What Ashrith is suggesting is not a bind variable.. You're constructing a string every time. Every time Spark will parse the query, create execution plan etc. Purpose of bind variables (in RDBMS systems for example) is to cut time on creating execution plan (which can be costly where there are a lot of joins etc). Spark has to have a special API to "parse" a query and then to "bind" variables. Spark does not have this functionality (as of today, Spark 1.6 release).

    Update 8/2018: as of Spark 2.3 there are (still) no bind variables in Spark.

    0 讨论(0)
提交回复
热议问题