How to Join Multiple Columns in Spark SQL using Java for filtering in DataFrame

前端 未结 2 684
梦谈多话
梦谈多话 2021-02-08 19:07
  • DataFrame a = contains column x,y,z,k
  • DataFrame b = contains column x,y,a

    a.join(b,
    
            
相关标签:
2条回答
  • 2021-02-08 19:57

    If you want to use Multiple columns for join, you can do something like this:

    a.join(b,scalaSeq, joinType)
    

    You can store your columns in Java-List and convert List to Scala seq. Conversion of Java-List to Scala-Seq:

    scalaSeq = JavaConverters.asScalaIteratorConverter(list.iterator()).asScala().toSeq();
    

    Example: a = a.join(b, scalaSeq, "inner");

    Note: Dynamic columns will be easily supported in this way.

    0 讨论(0)
  • 2021-02-08 19:59

    Spark SQL provides a group of methods on Column marked as java_expr_ops which are designed for Java interoperability. It includes and (see also or) method which can be used here:

    a.col("x").equalTo(b.col("x")).and(a.col("y").equalTo(b.col("y"))
    
    0 讨论(0)
提交回复
热议问题