DataFrame a
= contains column x,y,z,kDataFrame b
= contains column x,y,a
a.join(b,
If you want to use Multiple columns for join, you can do something like this:
a.join(b,scalaSeq, joinType)
You can store your columns in Java-List and convert List to Scala seq. Conversion of Java-List to Scala-Seq:
scalaSeq = JavaConverters.asScalaIteratorConverter(list.iterator()).asScala().toSeq();
Example: a = a.join(b, scalaSeq, "inner");
Note: Dynamic columns will be easily supported in this way.
Spark SQL provides a group of methods on Column
marked as java_expr_ops
which are designed for Java interoperability. It includes and (see also or) method which can be used here:
a.col("x").equalTo(b.col("x")).and(a.col("y").equalTo(b.col("y"))