kryo

How to register byte[][] using kryo serialization for spark

 ̄綄美尐妖づ 提交于 2019-12-19 07:52:31
问题 I am trying to fully utilize kryo serialization for spark. Setting .set("spark.kryo.registrationRequired", "true") This will let me know which classes need to be registered. I have registered about 40 classes, some of my classes and some of spark's classes. I followed Require kryo serialization in Spark (Scala) post to register/set everything up. I am now running into the following and cannot figure out how to register it in scala. Has anyone solved this issue? I have tried a bunch of

Scala pickling: Simple custom pickler for my own class?

荒凉一梦 提交于 2019-12-14 02:18:38
问题 I am trying to pickle some relatively-simple-structured but large-and-slow-to-create classes in a Scala NLP (natural language processing) app of mine. Because there's lots of data, it needs to pickle and esp. unpickle quickly and without bloat. Java serialization evidently sucks in this regard. I know about Kryo but I've never used it. I've also run into Apache Avro, which seems similar although I'm not quite sure why it's not normally mentioned as a suitable solution. Neither is Scala

Deserializing an array that contains some non-deserializable objects using Kryo (salvaging the deserializable parts)

爱⌒轻易说出口 提交于 2019-12-13 20:12:35
问题 Background I am attempting to write a Kryo deserialization in such a way that if an array of objects contains some objects that (due to code change) can't be deserialized then those references in the array will become null rather than throwing an exception; allowing the remainder of the object to be salvaged. I have previously been using Java's inbuilt serialization and within that I have been able to achieve this by writing a "known good" integer between each item in the array and then

how to register kryo serializer instances in storm?

空扰寡人 提交于 2019-12-12 16:13:04
问题 I'm desparately trying to configure serializer instances to use in my storm topology. The storm documentation states, there are 2 ways to register serializers : 1. The name of a class to register. In this case, Storm will use Kryo’s FieldsSerializer to serialize the class. This may or may not be optimal for the class – see the Kryo docs for more details. 2. A map from the name of a class to register to an implementation of com.esotericsoftware.kryo.Serializer. I want to use 2. -> Map<String,

How to set Unmodifiable collection serializer of Kryo in Spark code

回眸只為那壹抹淺笑 提交于 2019-12-12 13:31:54
问题 I am using Kryo serialization in Spark (v1.6.1) in Java and while serializing a class which has a collection in its field, it throws the following error - Caused by: java.lang.UnsupportedOperationException at java.util.Collections$UnmodifiableCollection.add(Collections.java:1055) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:102) at com.esotericsoftware.kryo.serializers.CollectionSerializer.read(CollectionSerializer.java:18) at com

Do you benefit from the Kryo serializer when you use Pyspark?

[亡魂溺海] 提交于 2019-12-12 08:27:50
问题 I read that the Kryo serializer can provide faster serialization when used in Apache Spark. However, I'm using Spark through Python. Do I still get notable benefits from switching to the Kryo serializer? 回答1: Kryo won’t make a major impact on PySpark because it just stores data as byte[] objects, which are fast to serialize even with Java. But it may be worth a try — you would just set the spark.serializer configuration and trying not to register any classe. What might make more impact is

Mapping values returns nothing in scala Flink

杀马特。学长 韩版系。学妹 提交于 2019-12-11 14:10:09
问题 I am developing a discretization algorithm in flink, but I am having problems applying a map function. The discretization is stored in V which is a private[this] val V = Vector.tabulate(nAttrs)(i => IntervalHeap(nBins, i, s)) This Vector is updated in the following method: private[this] def updateSamples(v: LabeledVector): Vector[IntervalHeap] = { val attrs = v.vector.map(_._2) // TODO: Check for missing values attrs .zipWithIndex .foreach { case (attr, i) => if (V(i).nInstances < s) { V(i)

Kryo Serialization with nested HashMap with custom class

自古美人都是妖i 提交于 2019-12-11 11:53:50
问题 I am trying to user kryo to serialze a custom class which itself contains some custom objects, more specifically a HashMap of custom objects. I was wondering the proper way to handle something like this. Below is the class I am trying to serialize (Data), the classes which are nested, and my current kryo implementation. This is the right approach? public class Data { int id, int name, ItemList items; } public Class ItemList { HashMap<String, Item> items; } public Class Item { String itemId;

Flink 类型和序列化机制简介

只谈情不闲聊 提交于 2019-12-11 11:24:57
【推荐】2019 Java 开发者跳槽指南.pdf(吐血整理) >>> 使用 Flink 编写处理逻辑时,新手总是容易被林林总总的概念所混淆: 为什么 Flink 有那么多的类型声明方式? BasicTypeInfo.STRING_TYPE_INFO、Types.STRING 、Types.STRING() 有何区别? TypeInfoFactory 又是什么? TypeInformation.of 和 TypeHint 是如何使用的呢? 接下来本文将逐步解密 Flink 的类型和序列化机制。 Flink 的类型分类 图 1:Flink 类型分类 Flink 的类型系统源码位于 org.apache.flink.api.common.typeinfo 包,让我们对图 1 深入追踪,看一下类的继承关系图: 图 2:TypeInformation 类继承关系图 可以看到,图 1 和 图 2 是一一对应的,TypeInformation 类是描述一切类型的公共基类,它和它的所有子类必须可序列化(Serializable),因为类型信息将会伴随 Flink 的作业提交,被传递给每个执行节点。 由于 Flink 自己管理内存,采用了一种非常紧凑的存储格式(见 官方博文 ),因而类型信息在整个数据处理流程中属于至关重要的元数据。 TypeExtractror 类型提取 Flink

Hazelcast Spring Session SubZero(Kryo) EntryBackupProcessorImpl NullPointerException issue

被刻印的时光 ゝ 提交于 2019-12-11 10:01:40
问题 I am using hazelcast-3.11.2 and SubZero-0.9 as global serializer. I am trying to configure Spring Session using this example. When I have more than one node in cluster - I get next exception when trying to get session id: 2019-03-20 15:01:59.088 ERROR 13635 --- [ration.thread-3] c.h.m.i.operation.EntryBackupOperation : [x.x.x.x]:5701 [hazelcast-group] [3.11.2] null java.lang.NullPointerException: null at com.hazelcast.map.AbstractEntryProcessor$EntryBackupProcessorImpl.processBackup