How Kryo serializer allocates buffer in Spark

问题

Please help to understand how Kryo serializer allocates memory for its buffer.

My Spark app fails on a collect step when it tries to collect about 122Mb of data to a driver from workers.

com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 0, required: 57197
    at com.esotericsoftware.kryo.io.Output.require(Output.java:138)
    at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:220)
    at com.esotericsoftware.kryo.io.Output.writeBytes(Output.java:206)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:29)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ByteArraySerializer.write(DefaultArraySerializers.java:18)
    at com.esotericsoftware.kryo.Kryo.writeObjectOrNull(Kryo.java:549)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:312)
    at com.esotericsoftware.kryo.serializers.DefaultArraySerializers$ObjectArraySerializer.write(DefaultArraySerializers.java:293)
    at com.esotericsoftware.kryo.Kryo.writeClassAndObject(Kryo.java:568)
    at org.apache.spark.serializer.KryoSerializerInstance.serialize(KryoSerializer.scala:161)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

This exception is shown after I've increased the driver memory to 3Gb and executor memory to 4Gb and increased buffer size for kryoserializer (I'm using Spark 1.3)

conf.set('spark.kryoserializer.buffer.mb', '256')
conf.set('spark.kryoserializer.buffer.max', '512')

I think I've set buffer to be big enough, but my spark app keeps crashing. How can I check what objects are using Kryo buffer on a executor? Is there way to clean it up?

回答1:

In my case, the problem was using the wrong property name for the max buffer size.

Up to Spark version 1.3 the property name is spark.kryoserializer.buffer.max.mb - it has ".mb" in the end. But I used property name from Spark 1.4 docs - spark.kryoserializer.buffer.max .

As a result spark app was using the default value - 64mb. And it was not enough for the amount of data I was processing.

After I fixed the property name to spark.kryoserializer.buffer.max.mb my app worked fine.

回答2:

Solution is to setup spark.kryoserializer.buffer.max to 1g in spark-default.conf and restarting spark services

This at least worked for me.

回答3:

Use conf.set('spark.kryoserializer.buffer.max.mb', 'val') to set kryoserializer buffer and keep in mind val should be less than 2048 otherwise you will get some error again indicating buffer should be less than 2048MB

回答4:

I am using spark 1.5.2 and I had the same issue. Setting spark.kryoserializer.buffer.max.mb to 256 fixed it.

回答5:

Now spark.kryoserializer.buffer.max.mb is deprecated

WARN spark.SparkConf: The configuration key 'spark.kryoserializer.buffer.max.mb' has been deprecated as of Spark 1.4 and and may be removed in the future. Please use the new key 'spark.kryoserializer.buffer.max' instead.

You should rather use:

import org.apache.spark.SparkConf
val conf = new SparkConf()
conf.set("spark.kryoserializer.buffer.max", "val")

来源：https://stackoverflow.com/questions/31947335/how-kryo-serializer-allocates-buffer-in-spark

标签

apache-spark

pyspark

kryo