It seems to me that a org.apache.hadoop.io.serializer.Serialization
could be written to serialize the java types directly in the same format that the wrapper cl
There is nothing stopping you changing the serialization to use a different mechanism such as java Serializable interface or something like thrift, protocol buffers etc.
In fact, Hadoop comes with an (experimental) Serialization implementation for Java Serializable objects - just configure the serialization factory to use it. The default serialization mechanism is WritableSerialization
, but this can be changed by setting the following configuration property:
io.serializations=org.apache.hadoop.io.serializer.JavaSerialization
Bear in mind however that anything that expects a Writable (Input/Output formats, partitioners, comparators) etc will need to be replaced by versions that can be passed a Serializable
instance rather than a Writable
instance.
Some more links for the curious reader: