I recently heard this lecture which suggests (@19:07) to use Spark SQL Nested/Complex types in order to Reduce Shuffles. However, the talk does not dive much deeper into this to