Serialization/Deserialization of a Vector of Integers in C++

吃可爱长大的小学妹 提交于 2020-03-16 08:24:13

问题


Task to be Acomplished

I'm trying to serialize a vector of integers into a string so that it can be stored into a file. The approach used is to copy the integers byte-by-byte into a buffer. For this I used the std::copy_n function.

To deserialize, I've done the same thing in reverse i.e. copied byte-by-byte into an integer from the buffer and appended those integers to a vector.

I'm not sure if this is the best/fastest way to achieve this.

Code

Serialize function

char *serialize(vector <int> nums)
{
    char *buffer = (char *)malloc(sizeof(int)*nums.size());
    vector <int>::iterator i;
    int j;
    for(i = nums.begin(), j = 0; i != nums.end(); i++, j += 4) {
        copy_n(i, 4, buffer+j);
    }
    return buffer;
}

Deserialize function

vector <int> deserialize(char *str, int len)
{
    int num;
    vector <int> ret;
    for(int j = 0; j < len; j+=4) {
        copy_n(str+j, 4, &num);
        ret.push_back(num);
    }
    return ret;
}

Any inputs on how I can improve this bit of code would be really helpful. I would also love to know other approaches to achieve the same.


回答1:


Your approach has a number of problems.

char *serialize(vector <int> nums)
{
    char *buffer = (char *)malloc(sizeof(int)*nums.size());
    vector <int>::iterator i;
    int j;
    for(i = nums.begin(), j = 0; i != nums.end(); i++, j += 4) {
        copy_n(i, 4, buffer+j);
    }
    return buffer;
}

1) It allocates memory manually, which is dangerous and rarely necessary.

2) It doesn't do what you think it does. It literally copies each int and tries to stuff it into a char. So the data is getting corrupted if any of the values are above 255 (the maximum number stuffable into a char).

If you are looking for efficiency then I would think the best way would be to write the data directly to the output stream rather than converting it to a string first.

Bear in mind, writing out binary data like this is not portable. I would only use this for serializing/deserializing local data. Ideally from a single session. Beyond that you have to start thinking about making each output data portable and it gets more complicated. Personally I would avoid the binary approach altogether unless absolutely necessary.

If you must do it, I would probably do something more like this:

template<typename POD>
std::ostream& serialize(std::ostream& os, std::vector<POD> const& v)
{
    // this only works on built in data types (PODs)
    static_assert(std::is_trivial<POD>::value && std::is_standard_layout<POD>::value,
        "Can only serialize POD types with this function");

    auto size = v.size();
    os.write(reinterpret_cast<char const*>(&size), sizeof(size));
    os.write(reinterpret_cast<char const*>(v.data()), v.size() * sizeof(POD));
    return os;
}

template<typename POD>
std::istream& deserialize(std::istream& is, std::vector<POD>& v)
{
    static_assert(std::is_trivial<POD>::value && std::is_standard_layout<POD>::value,
        "Can only deserialize POD types with this function");

    decltype(v.size()) size;
    is.read(reinterpret_cast<char*>(&size), sizeof(size));
    v.resize(size);
    is.read(reinterpret_cast<char*>(v.data()), v.size() * sizeof(POD));
    return is;
}

The interface to these functions follows the convention set in the Standard Library and it flexible enough that you can use it to serialize to files (using std::fstream) or strings (using std::stringstream).

std::vector<int> v = {1, 2, 3, 500, 900};

std::stringstream oss; // this could just as well be a `std::fstream` 

if(serialize(oss, v))
{
    std::vector<int> n;
    if(deserialize(oss, n))
    {
        for(auto i: n)
            std::cout << i << '\n';
    }
}

Output:

1
2
3
500
900



回答2:


but I'm not sure if this is the best/fastest way to achieve this.

Deep breath...

The simplest questions have the most complex answers.

Arguably the simplest way to achieve this is to simply stream the integers as decimal digits. This is "best" if human readability of the file is important to you.

The method you have employed is the simplest from the point of view of the programmer, but it makes no attempt to cater for the different bit-representations of integers on different systems. Therefore it's simple until you want to read that file back in on a different machine, at which point it becomes a headache.

Also note that neither of the above approaches is space-efficient. When it comes to I/O, shorter is always faster and any time spent manipulating data before writing and after reading is dwarfed by the time taken to transmit or store it.

If real I/O performance and portability is important to you (they probably should be) then you may want to consider an appropriate encoding scheme.

Zig-Zag encoding one scheme that is both portable and efficient. It works on the basis that most integers we encounter in life tend to be closer to zero than they are to INT_MAX.

Some links you get you started:

https://gist.github.com/mfuerstenau/ba870a29e16536fdbaba

https://developers.google.com/protocol-buffers/docs/encoding



来源:https://stackoverflow.com/questions/51230764/serialization-deserialization-of-a-vector-of-integers-in-c

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!