I am Using spark-sql 2.4.1 and java 8.
val country_df = Seq(
(\"us\",2001),
(\"fr\",2002),
(\"jp\",2002),
(\"in\",2001),
(\"fr\",2003),
scala> val colname = col_df.rdd.collect.toList.map(x => x(0).toString).toSeq
scala> data_df.select(colname.head, colname.tail: _*).show()
+----------+----------+
| us| in|
+----------+----------+
|us_state_1|in_state_1|
|us_state_2|in_state_2|
|us_state_3|in_state_3|
+----------+----------+
You can try with the below code.
Select the column name from the first dataset.
List<String> columns = country_df.select("country").where($"data_yr" === 2001).as(Encoders.STRING()).collectAsList();
Use the column names in selectexpr in second dataset.
public static Seq<String> convertListToSeq(List<String> inputList) {
return JavaConverters.asScalaIteratorConverter(inputList.iterator()).asScala().toSeq();
}
//using selectExpr
data_df.selectExpr(convertListToSeq(columns)).show(true);
Using pivot you can get the values as column names directly like this:
val selectCols = col_df.groupBy().pivot($"country").agg(lit(null)).columns
data_df.select(selectCols.head, selectCols.tail: _*)