Spark (Scala) execute dataframe within for loop

最后都变了- 提交于 2020-01-07 09:55:15

问题


I am using spark 1.6.1 version. I have requirement to execute dataframe in loop.

for ( i <- List ('a','b')){
 val i = sqlContext.sql("SELECT i, col1, col2 FROM DF1")}

I want this dataframe to be executed twice (i = a and i = b).


回答1:


Your code is almost correct. Except two things :

  • i is already used in your for loop so don't use it in val i =
  • If you want to use the value of i in a string, use String Interpolation

So your code should look like :

for (i <- List ('a','b')) {
  val df = sqlContext.sql(s"SELECT $i, col1, col2 FROM DF1")
  df.show()
}

EDIT after author comment :

You can do this with a .map and then a .reduceLeft :

// All your dataframes
val dfs = Seq('a','b').map { i =>
  sqlContext.sql(s"SELECT $i, col1, col2 FROM DF1")
}

// Then you can reduce your dataframes into one
val unionDF = dfs.reduceLeft((dfa, dfb) => 
  dfa.unionAll(dfb)
)


来源:https://stackoverflow.com/questions/42900556/spark-scala-execute-dataframe-within-for-loop

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!