How do I implement a fastutils map in a Spark UDAF?

╄→гoц情女王★ 提交于 2019-12-11 17:43:33

问题


I'm building a Spark UDAF where I'm storing the intermediate data in a fastutils map. Schema looks like this:

def bufferSchema = new StructType().add("my_map_col", MapType(StringType, IntegerType))

I initialize with no problem:

def initialize(buffer: MutableAggregationBuffer) = {
   buffer(0) = new Object2IntOpenHashMap[String]()
}

Problem comes when I try to update:

def update(buffer: MutableAggregationBuffer, input: Row) = { 
  val myMap = buffer.getAs[Object2IntOpenHashMap[String]](0)
  myMap.put(input.getAs[String](0), 1)
  buffer(0) = myMap
}

Getting the following error:

Caused by: java.lang.ClassCastException: scala.collection.immutable.Map$EmptyMap$ cannot be cast to it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap

Any way I can make this work?


回答1:


Any way I can make this work?

Not really. This

buffer.getAs[Object2IntOpenHashMap[String]](0)

is equivalent to

buffer.get(0).asInstanceOf[Object2IntOpenHashMap[String]]]

and the external type for MapType is scala.collection.Map.

In practice it is a dead-end anyway - UserDefinedAggregate functions make full copy of data on each call. You might have a better luck with Aggregator (as in the linked question).



来源:https://stackoverflow.com/questions/54544773/how-do-i-implement-a-fastutils-map-in-a-spark-udaf

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!