I am starting to use Spark DataFrames and I need to be able to pivot the data to create multiple columns out of 1 column with multiple rows. There is built in functionality
There is simple and elegant solution.
scala> spark.sql("select * from k_tags limit 10").show()
+---------------+-------------+------+
| imsi| name| value|
+---------------+-------------+------+
|246021000000000| age| 37|
|246021000000000| gender|Female|
|246021000000000| arpu| 22|
|246021000000000| DeviceType| Phone|
|246021000000000|DataAllowance| 6GB|
+---------------+-------------+------+
scala> spark.sql("select * from k_tags limit 10").groupBy($"imsi").pivot("name").agg(min($"value")).show()
+---------------+-------------+----------+---+----+------+
| imsi|DataAllowance|DeviceType|age|arpu|gender|
+---------------+-------------+----------+---+----+------+
|246021000000000| 6GB| Phone| 37| 22|Female|
|246021000000001| 1GB| Phone| 72| 10| Male|
+---------------+-------------+----------+---+----+------+