Spark flattening out dataframes

て烟熏妆下的殇ゞ 提交于 2019-12-11 05:20:59

问题


getting started with spark I would like to know how to flatmap or explode a dataframe.

It was created using df.groupBy("columName").count and has the following structure if I collect it:

 [[Key1, count], [Key2, count2]] 

But I would rather like to have something like

Map(bar -> 1, foo -> 1, awesome -> 1)

What is the right tool to achieve something like this? Flatmap, explode or something else?

Context: I want to use spark-jobserver. It only seems to provide meaningful results (e.g. a working json serialization) in case I supply the data in the latter forml


回答1:


I'm assuming you're calling collect or collectAsListon the DataFrame? That would return an Array[Row] / List[Row].

If so - the easiest way to transform these into maps is to use the underlying RDD, map its recrods into key-value tuples and use collectAsMap:

def counted = df.groupBy("columName").count()
// obviously, replace "keyColumn" and "valueColumn" with your actual column names
def result = counted.rdd.map(r => (r.getAs[String]("keyColumn"), r.getAs[Long]("valueColumn"))).collectAsMap()

result has type Map[String, Long] as expected.



来源:https://stackoverflow.com/questions/36541565/spark-flattening-out-dataframes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!