How to get strings separated by commas from a list to a query in PySpark?

三世轮回 提交于 2019-12-11 18:27:41

问题


I want to generate a query by using a list in PySpark

list = ["hi@gmail.com", "goodbye@gmail.com"]
query = "SELECT * FROM table WHERE email IN (" + list + ")"

This is my desired output:

query
SELECT * FROM table WHERE email IN ("hi@gmail.com", "goodbye@gmail.com")

Instead I'm getting: TypeError: cannot concatenate 'str' and 'list' objects

Can anyone help me achieve this? Thanks


回答1:


If someone's having the same issue, I found that you can use the following code:

"'"+"','".join(map(str, emails))+"'"

and you will have the following output:

SELECT * FROM table WHERE email IN ('hi@gmail.com', 'goodbye@gmail.com')




回答2:


Try this:

Dataframe based approach -

df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])

email_filter_list = ["hi@gmail.com", "goodbye@gmail.com"]

df.where(col('email_id').isin(email_filter_list)).show()

Spark SQL based approach -

df = spark.createDataFrame([(1,"hi@gmail.com") ,(2,"goodbye@gmail.com",),(3,"abc@gmail.com",),(4,"xyz@gmail.com")], ['id','email_id'])
df.createOrReplaceTempView('t1')

sql_filter  = ','.join(["'" +i + "'" for i in email_filter_list])

spark.sql("SELECT * FROM t1 WHERE email_id IN ({})".format(sql_filter)).show()


来源:https://stackoverflow.com/questions/55288734/how-to-get-strings-separated-by-commas-from-a-list-to-a-query-in-pyspark

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!