Binary version of iostream

时间秒杀一切 提交于 2019-12-04 07:55:36

问题


I've been writing a binary version of iostreams. It essentially allows you to write binary files, but gives you much control over the format of the file. Example usage:

my_file << binary::u32le << my_int << binary::u16le << my_string;

Would write my_int as a unsigned 32-bit integer, and my_string as a length-prefixed string (where the prefix is u16le.) To read the file back, you would flip the arrows. Works great. However, I hit a bump in the design, and I'm still on the fence about it. So, time to ask SO. (We make a couple of assumptions, such as 8-bit bytes, 2s-complement ints, and IEEE floats at the moment.)

iostreams, under the hood, use streambufs. It's a fantastic design really -- iostreams code the serialization of an 'int' into text, and let the underlying streambuf handle the rest. Thus, you get cout, fstreams, stringstreams, etc. All of these, both the iostreams and the streambufs, are templated, usually on char, but sometimes also as a wchar. My data, however, is a byte stream, which best represented by 'unsigned char'.

My first attempts were to template the classes based on unsigned char. std::basic_string templates well enough, but streambuf does not. I ran into several problems with a class named codecvt, which I could never get to follow the unsigned char theme. This raises two questions:

1) Why is a streambuf responsible for such things? It seems like code-conversions lie way out of a streambuf's responsibility -- streambufs should take a stream, and buffer data to/from it. Nothing more. Something as high level as code conversions feels like it should belong in iostreams.

Since I couldn't get the templated streambufs to work with unsigned char, I went back to char, and merely casted data between char/unsigned char. I tried to minimize the number of casts, for obvious reasons. Most of the data basically winds up in a read() or write() function, which then invoke the underlying streambuf. (And use a cast in the process.) The read function is basically:

size_t read(unsigned char *buffer, size_t size)
{
    size_t ret;
    ret = stream()->sgetn(reinterpret_cast<char *>(buffer), size);
    // deal with ret for return size, eof, errors, etc.
    ...
}

Good solution, bad solution?


The first two questions indicate that more info is needed. First, projects such as boost::serialization were looked at, but they exist at a higher level, in that they define their own binary format. This is more for reading/writing at a lower level, where it is wished to define the format, or the format is already defined, or the bulk metadata is not required or desired.

Second, some have asked about the binary::u32le modifier. It is an instantiation of a class that holds the desired endianness and width, at the moment, perhaps signed-ness in the future. The stream holds a copy of the last-passed instance of that class, and used that in serialization. This was a bit of a workaround, I orginally tried overloading the << operator thusly:

bostream &operator << (uint8_t n);
bostream &operator << (uint16_t n);
bostream &operator << (uint32_t n);
bostream &operator << (uint64_t n);

However at the time, this didn't seem to work. I had several problems with ambiguous function call. This was especially true of constants, although you could, as one poster suggested, cast or merely declare it as a const <type>. I seem to remember that there was some other larger problem however.


回答1:


I agree with legalize. I needed to do almost exactly what you're doing, and looked at overloading << / >>, but came to the conclusion that iostream was just not designed to accommodate it. For one thing, I didn't want to have to subclass the stream classes to be able to define my overloads.

My solution (which only needed to serialize data temporarily on a single machine, and therefore did not need to address endianness) was based on this pattern:

// deducible template argument read
template <class T>
void read_raw(std::istream& stream, T& value,
    typename boost::enable_if< boost::is_pod<T> >::type* dummy = 0)
{
    stream.read(reinterpret_cast<char*>(&value), sizeof(value));
}

// explicit template argument read
template <class T>
T read_raw(std::istream& stream)
{
    T value;
    read_raw(stream, value);
    return value;
}

template <class T>
void write_raw(std::ostream& stream, const T& value,
    typename boost::enable_if< boost::is_pod<T> >::type* dummy = 0)
{
    stream.write(reinterpret_cast<const char*>(&value), sizeof(value));
}

I then further overloaded read_raw/write_raw for any non-POD types (e.g. strings). Note that only the first version of read_raw need be overloaded; if you use ADL correctly, the second (1-arg) version can call 2-arg overloads defined later and in other namespaces.

Write example:

int32_t x;
int64_t y;
int8_t z;
write_raw(is, x);
write_raw(is, y);
write_raw<int16_t>(is, z); // explicitly write int8_t as int16_t

Read example:

int32_t x = read_raw<int32_t>(is); // explicit form
int64_t y;
read_raw(is, y); // implicit form
int8_t z = numeric_cast<int8_t>(read_raw<int16_t>(is));

It's not as sexy as overloaded operators, and things don't fit on one line as easily (which I tend to avoid anyway, since debug breakpoints are line-oriented), but I think it turned out simpler, more obvious, and not much more verbose.




回答2:


As I understand it, the stream properties that you're using to specify types would be more appropriate for specifying endian-ness, packing, or other "meta-data" values. The handling of types themselves should be done by the compiler. At least, that's the way the STL seems to be designed.

If you use overloads to separate the types automatically, you would need to specify the type only when it was different from the declared type of the variable:

Stream& operator<<(int8_t);
Stream& operator<<(uint8_t);
Stream& operator<<(int16_t);
Stream& operator<<(uint16_t);
etc.

uint32_t x;
stream << x << (uint16_t)x;

Reading types other than the declared type would be a little messier. In general, though, reading to or writing from variables of a type different from the output type should be avoided, I think.

I believe the default version of std::codecvt does nothing, returning "noconv" for everything. It only really does anything when using the "wide" character streams. Can't you set up a similar definition for codecvt? If, for some reason, it's impractical to define a no-op codecvt for your stream, then I don't see any problem with your casting solution, especially since it's isolated to one location.

Finally, are you sure you wouldn't be better off using some standard serialization code, like Boost, rather than rolling your own?




回答3:


We needed to do something similar to what you are doing but we followed another path. I am interested in how you have defined your interface. Part of what I don't know how you can handle are the manipulators you have defined (binary::u32le, binaryu16le).

With basic_streams, the manipulator controls how all the following elements will be read/written, but in your case, it probably does not make sense, as the size (part of your manipulator information) is affected by the variable passed in and out.

binary_istream in;
int i;
int i2;
short s;
in >> binary::u16le >> i >> binary::u32le >> i2 >> s;

In the code above, it can make sense determining that whether the i variable is 32 bits (assuming int is 32 bits) you want to extract from the serialized stream only 16 bits, while you want to extract the full 32 bits into i2. After that, either the user is forced to introduce manipulators for each and every other type that is passed in, or else the manipulator still has effect and when the short is passed in and 32 bits are read with a possible overflow, and in any way the user will probably get unexpected results.

Size does not seem to belong (in my opinion) to manipulators.

Just as a side note, in our case, as we had other constraints as runtime definition of types, and we ended up building our own meta-type-system to build types at runtime (a type of variant), and then we ended up implementing de/serialization for those types (boost style), so our serializers don't work with basic C++ types, but rather with serialization/data pairs.




回答4:


I wouldn't use operator<< as its too intimately associated with formatted text I/O.

I wouldn't use an operator overload at all for this, actually. I'd find another idiom.



来源:https://stackoverflow.com/questions/1150843/binary-version-of-iostream

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!