What the best ways to use decimals and datetimes with protocol buffers?

前端 未结 4 2049
后悔当初
后悔当初 2021-02-08 14:56

I would like to find out what is the optimum way of storing some common data type that were not included in the list supported by protocol buffers.

  • datetime (secon
4条回答
  •  梦毁少年i
    2021-02-08 15:19

    Here are some ideas based on my experience with a wire protocol similar to Protocol Buffers.

    datetime (seconds precision)

    datetime (milliseconds precision)

    I think the answer to these two would be the same, you would just typically be dealing with a smaller range of numbers in the case of seconds precision.

    Use a sint64/sfixed64 to store the offset in seconds/milliseconds from some well-known epoch like midnight GMT 1/1/1970. This how Date objects are internally represented in Java. I'm sure there are analogs in Python and C++.

    If you need time zone information, pass around your date/times in terms of UTC and model the pertinent time zone as a separate string field. For that, you can use the identifiers from the Olson Zoneinfo database since that has become somewhat standard.

    This way you have a canonical representation for date/time, but you can also localize to whatever time zone is pertinent.

    decimals with fixed precision

    My first thought is to use a string similar to how one constructs Decimal objects from Python's decimal package. I suppose that could be inefficient relative to some numerical representation.

    There may be better solutions depending on what domain you're working with. For example, if you're modeling a monetary value, maybe you can get away with using a uint32/64 to communicate the value in cents as opposed to fractional dollar amounts.

    There are also some useful suggestions in this thread.

    decimals with variable precision

    Doesn't Protocol Buffers already support this with float/double scalar types? Maybe I've misunderstood this bullet point.

    Anyway, if you had a need to go around those scalar types, you can encode using IEEE-754 to uint32 or uint64 (float vs double respectively). For example, Java allows you to extract the IEEE-754 representation and vice versa from Float/Double objects. There are analogous mechanisms in C++/Python.

    lots of bool values (if you have lots of them it looks like you'll have 1-2 bytes overhead for each of them due to their tags.

    If you are concerned about wasted bytes on the wire, you could use bit-masking techniques to compress many booleans into a single uint32 or uint64.

    Because there isn't first class support in Protocol Buffers, all of these techniques require a bit of a gentlemens' contract between agents. Perhaps using a naming convention on your fields like "_dttm" or "_mask" would help communicate when a given field has additional encoding semantics above and beyond the default behavior of Protocol Buffers.

提交回复
热议问题