There are two tables; one is ID Table 1 and the other is Attribute Table 2.
Table 1
Table 2
If the IDs the same row in Table 1 has same
This should do the trick
import spark.implicits._
val t1 = List(
("id1","id2"),
("id1","id3"),
("id2","id3")
).toDF("id_x", "id_y")
val t2 = List(
("id1","blue","m"),
("id2","red","s"),
("id3","blue","s")
).toDF("id", "color", "size")
t1
.join(t2.as("x"), $"id_x" === $"x.id", "inner")
.join(t2.as("y"), $"id_y" === $"y.id", "inner")
.select(
'id_x,
'id_y,
when($"x.color" === $"y.color",1).otherwise(0).alias("color").cast(IntegerType),
when($"x.size" === $"y.size",1).otherwise(0).alias("size").cast(IntegerType)
)
.show()
Resulting in:
+----+----+-----+----+
|id_x|id_y|color|size|
+----+----+-----+----+
| id1| id2| 0| 0|
| id1| id3| 1| 0|
| id2| id3| 0| 1|
+----+----+-----+----+