问题
Which serialization is used for which case,
From spark documentation it says :
It provides two serialization libraries:
1. Java(default) and
2. Kryo
Now where did Encoders come from and why is it not given in the doc.
And also from databricks it says Encoders performs faster for Datasets,what about RDD, and how do all these maps together.
In which case which serializer should we use?
回答1:
Encoders
are used inDataset
only.Kryo
is used internally in spark.Kryo
andJava
serialization is available for you to use for your data shuffling.
As to which should you use - Kryo
is your best option if you don't use Dataset
. Otherwise you don't have any options, actually.
来源:https://stackoverflow.com/questions/59298413/kryo-vs-encoder-vs-java-serialization-in-spark