Why does specifying Map's initial capacity cause subsequent serializations to give different results?

问题

I am trying to compare 2 byte[] which are the results of serialization of the same object:

1 byte[] is created by serializing the object
the other by deserializing the 1st byte[] and then serializing it again.

I do not understand how these 2 arrays can be different. Deserializing the first byte[] should reconstruct the original object, and serializing that object is the same as serializing the original one. So, the 2 byte[] should be the same. However, under certain circumstances they can be different, apparently.

The object I am serializing (State) holds a list of another object (MapWrapper) which in turn holds a single collection. Depending on the collection, I get different results from my comparison code.

Here is the MCVE:

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

public class Test {

    public static void main(String[] args) {

        State state = new State();
        state.maps.add(new MapWrapper());

        byte[] pBA = stateToByteArray(state);
        State pC = byteArrayToState(pBA);
        byte[] zero = stateToByteArray(pC);
        System.out.println(Arrays.equals(pBA, zero)); // see output below
        State pC2 = byteArrayToState(pBA);
        byte[] zero2 = stateToByteArray(pC2);
        System.out.println(Arrays.equals(zero2, zero)); // always true
    }

    public static byte[] stateToByteArray(State s) {

        try {
            ByteArrayOutputStream bos = new ByteArrayOutputStream();
            ObjectOutputStream oos = new ObjectOutputStream(bos);
            oos.writeObject(s);
            return bos.toByteArray();
        } catch (IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    public static State byteArrayToState(byte[] bytes) {

        ObjectInputStream ois;
        try {
            ois = new ObjectInputStream(new ByteArrayInputStream(bytes));
            return (State) ois.readObject();
        } catch (IOException | ClassNotFoundException e) {
            e.printStackTrace();
        }
        return null;
    }
}

class State implements Serializable {

    private static final long serialVersionUID = 1L;

    List<MapWrapper> maps = new ArrayList<>();
}

class MapWrapper implements Serializable {

    private static final long serialVersionUID = 1L;

    // Different options, choose one!
//  List<Integer> ints = new ArrayList<>();       true
//  List<Integer> ints = new ArrayList<>(3);      true
//  Map<String, Integer> map = new HashMap<>();   true
//  Map<String, Integer> map = new HashMap<>(2);  false
}

For some reason, if MapWrapper contains a HashMap (or LinkedHashMap) and is initialized with an initial capacity, the serialization gives a different result than a serialization-deserialization-serialization.

I added a 2nd iteration of deserialization-serialization and compared to the 1st. They are always equal. The difference manifests only after the first iteration.

Note that I must create a MapWrapper and add it to the list in State, as done in the start of main, to cause this.

As much as I know, the initial capacity is a performance parameter only. Using the default one or a specified one should not change behavior or functionality.

I am using jdk1.8.0_25 and Windows7.

Why does this happen?

回答1:

The following line and comment in the HashMap source code of readObject explains the difference:

s.readInt();                // Read and ignore number of buckets

Indeed, looking at the hex of the bytes, the difference is between a number 2 (your configured number of buckets) and a number 16 (the default number of buckets). I haven't checked that's what this particular byte means; but it'd be quite a coincidence if it's something else, considering that's the only difference.

<snip> 08 00 00 00 02 00 00 00 00 78 78   // Original
<snip> 08 00 00 00 10 00 00 00 00 78 78   // Deserialized+serialized.
                   ^

来源：https://stackoverflow.com/questions/38635375/why-does-specifying-maps-initial-capacity-cause-subsequent-serializations-to-gi

标签

java

serialization

hashmap

linkedhashmap