I would like to perform an action on a single column. Unfortunately, after I transform that column, it is now no longer a part of the dataframe it came from but a Column object.
Spark >= 2.0
Starting from Spark 2.0.0 you need to explicitly specify .rdd in order to use flatMap
.rdd
flatMap
df.select("array").rdd.flatMap(lambda x: x).collect()
Spark < 2.0
Just select and flatMap:
select
df.select("array").flatMap(lambda x: x).collect() ## [[1, 2, 3]]