问题
I'm building a Spark UDAF where I'm storing the intermediate data in a fastutils map. Schema looks like this:
def bufferSchema = new StructType().add("my_map_col", MapType(StringType, IntegerType))
I initialize with no problem:
def initialize(buffer: MutableAggregationBuffer) = {
buffer(0) = new Object2IntOpenHashMap[String]()
}
Problem comes when I try to update:
def update(buffer: MutableAggregationBuffer, input: Row) = {
val myMap = buffer.getAs[Object2IntOpenHashMap[String]](0)
myMap.put(input.getAs[String](0), 1)
buffer(0) = myMap
}
Getting the following error:
Caused by: java.lang.ClassCastException: scala.collection.immutable.Map$EmptyMap$ cannot be cast to it.unimi.dsi.fastutil.objects.Object2IntOpenHashMap
Any way I can make this work?
回答1:
Any way I can make this work?
Not really. This
buffer.getAs[Object2IntOpenHashMap[String]](0)
is equivalent to
buffer.get(0).asInstanceOf[Object2IntOpenHashMap[String]]]
and the external type for MapType is scala.collection.Map.
In practice it is a dead-end anyway - UserDefinedAggregate
functions make full copy of data on each call. You might have a better luck with Aggregator
(as in the linked question).
来源:https://stackoverflow.com/questions/54544773/how-do-i-implement-a-fastutils-map-in-a-spark-udaf