I am using spark-sql-2.4.1v how to do various joins depend on the value of column
I need get multiple look up values of map_val
column for given value columns a
Try this-
Create lookup map before join per id and use the same to replace
val newRateDS = rateDs.withColumn("lookUpMap",
map_from_entries(collect_list(struct(col("map_code"), col("map_val"))).over(Window.partitionBy("id")))
)
newRateDS.show(false)
/**
* +---+----------+----------+--------+-------+------------------+
* |id |start_date|end_date |map_code|map_val|lookUpMap |
* +---+----------+----------+--------+-------+------------------+
* |21 |2018-01-31|2018-06-31|12 |C |[12 -> C, 13 -> D]|
* |21 |2018-01-31|2018-06-31|13 |D |[12 -> C, 13 -> D]|
* +---+----------+----------+--------+-------+------------------+
*/
val resultDs = df.filter(col("code").equalTo(lit("rate"))).join(broadcast(newRateDS) ,
rateDs("id") === df("id") && df("date").between(rateDs("start_date"), rateDs("end_date"))
//.and(rateDs.col("mapping_value").equalTo(df.col("mean")))
, "left"
)
resultDs.withColumn("value1", expr("coalesce(lookUpMap[value1], value1)"))
.withColumn("value2", expr("coalesce(lookUpMap[value2], value2)"))
.show(false)
/**
* +---+----+------+----------+------+------+----+----------+----------+--------+-------+------------------+
* |id |code|entity|date |value1|value2|id |start_date|end_date |map_code|map_val|lookUpMap |
* +---+----+------+----------+------+------+----+----------+----------+--------+-------+------------------+
* |22 |rate|school|2018-03-31|11 |14 |null|null |null |null |null |null |
* |21 |rate|school|2018-03-31|D |C |21 |2018-01-31|2018-06-31|13 |D |[12 -> C, 13 -> D]|
* |21 |rate|school|2018-03-31|D |C |21 |2018-01-31|2018-06-31|12 |C |[12 -> C, 13 -> D]|
* +---+----+------+----------+------+------+----+----------+----------+--------+-------+------------------+
*/