Go deserialization when type is not known

问题

I'm writing a package in go to send messages between services, using a specific type of transport.

I'd like the package to not understand the type of messages being sent. My first thought is to serialize the message object into json, send that, deserialize on the receiving end, and pass the go object (as an interface{}) to the subscribing code.

The serialization isn't a problem, but I don't see how the generic package code can deserialize the message since it doesn't know the type. I thought of using reflect TypeOf(), and passing that value as part of the message. But I don't see how to accomplish this since Type is an interface and the implementing rtype is not exported.

If the receiving app gets an interface{}, it is going to have to check the type anyways, so maybe it should just do the deserialization. Or the receiver could provide a reflect Type so the package can deserialize?

Or it could give the receiver a map of field name to value, but I'd prefer the actual type.

Any suggestions?

Let me add an example:

I have a go channel for sending change notifications of different types of objects. Since go doesn't support tagged unions, I define the channel type as:

type UpdateInfo struct {
    UpdateType UpdateType
    OldObject interface{}
    NewObject interface{}
}

The receiving end of the channel gets an UpdateInfo with OldObject and NewObject as the actual concrete object types that were sent.

I want to extend this to work between applications, where the transport will be via a message queue to support pub/sub, multiple consumers, etc.

回答1:

TL;DR

Just use json.Unmarshal. You can wrap it lightly, using your transport, and call json.Unmarshal (or with a json.Decoder instance, use d.Decode) on your prebuilt JSON bytes and the v interface{} argument from your caller.

Somewhat longer, with an example

Consider how json.Unmarshal does its own magic. Its first argument is the JSON (data []byte), but its second argument is of type interface{}:

func Unmarshal(data []byte, v interface{}) error

As the documentation goes on to say, if v really is just an interface{}:

To unmarshal JSON into an interface value, Unmarshal stores one of these in the interface value:
bool, for JSON booleans
float64, for JSON numbers
string, for JSON strings
[]interface{}, for JSON arrays
map[string]interface{}, for JSON objects
nil for JSON null

but if v has an underlying concrete type, such as type myData struct { ... }, it's much fancier. It only does the above if v's underlying type is interface{}.

Its actual implementation is particularly complex because it's optimized to do the de-JSON-ification and the assignment into the target object at the same time. In principle, though, it's mostly a big type-switch on the underlying (concrete) type of the interface value.

Meanwhile, what you are describing in your question is that you will first deserialize into generic JSON—which really means a variable of type interface{}—and then do your own assignment out of this pre-decoded JSON into another variable of type interface{}, where the type signature of your own decoder would be:

func xxxDecoder(/* maybe some args here, */ v interface{}) error {
    var predecoded interface{}

    // get some json bytes from somewhere into variable `data`
    err := json.Unmarshal(data, &predecoded)

    // now emulate json.Unmarshal by getting field names and assigning
    ... this is the hard part ...
}

and you would then call this code by writing:

type myData struct {
    Field1 int    `xxx:"Field1"`
    Field2 string `xxx:"Field2"`
}

so that you know that JSON object key "Field1" should fill in your Field1 field with an integer, and JSON object key "Field2" should fill in your Field2 field with a string:

func whatever() {
    var x myData
    err := xxxDecode(..., &x)
    if err != nil { ... handle error ... }
    ... use x.Field1 and x.Field2 ...
}

But this is silly. You can just write:

type myData struct {
    Field1 int    `json:"Field1"`
    Field2 string `json:"Field2"`
}

(or even omit the tags since the field's names are the default json tags), and then do this:

func xxxDecode(..., v interface{}) error {
    ... get data bytes as before ...
    return json.Unmarshal(data, v)
}

In other words, just let json.Unmarshal do all the work by providing json tags in the data structures in question. You still get—and transmit across your special transport—the JSON data bytes from json.Marshal and json.Unmarshal. You do the transmitting and receiving. json.Marshal and json.Unmarshal do all the hard work: you don't have to touch it!

It's still fun to see how `Json.Unmarshal` works

Jump down to around line 660 of encoding/json/decode.go, where you will find the thing that handles a JSON "object" ({ followed by either } or a string that represents a key), for instance:

func (d *decodeState) object(v reflect.Value) error {

There are some mechanics to handle corner cases (including the fact that v might not be settable and/or might be a pointer that should be followed), then it makes sure that v is either a map[T1]T2 or struct, and if it is a map, that it's suitable—that both T1 and T2 will work when decoding the "key":value items in the object.

If all goes well, it gets into the JSON key-and-value scanning loop starting at line 720 (for {, which will break or return as appropriate). On each trip through this loop, the code reads the JSON key first, leaving the : and value part for later.

If we're decoding into a struct, the decoder now uses the struct's fields—names and json:"..." tags—to find a reflect.Value that we'll use to store right into the field.¹ This is subv, found by calling v.Field(i) for the right i, with some slightly complicated goo to handle embedded anonymous structs and pointer-following. The core of this is just subv = v.Field(i), though, where i is whichever field this key names, within the struct. So subv is now a reflect.Value that represents the actual struct instance's value, which we should set once we've decoded the value part of the JSON key-value pair.

If we're decoding into a map, we will decode the value into a temporary first, then store it into the map after decoding. It would be nice to share this with the struct-field storing, but we need a different reflect function to do the store into the map: v.SetMapIndex, where v is the reflect.Value of the map. That's why for a map, subv points to a temporary Elem.

We're now ready to convert the actual value to the target type, so we go back to the JSON bytes and consume the colon : character and read the JSON value. We get the value and store it into our storage location (subv). This is the code starting at line 809 (if destring {). The actual assigning is done through the decoder functions (d.literalStore at line 908, or d.value at line 412) and these actually decode the JSON value while doing the storing. Note that only d.literalStore really stores the value—d.value calls on d.array, d.object, or d.literalStore to do the work recursively if needed.

d.literalStore therefore contains many switch v.Kind()s: it parses a null or a true or false or an integer or a string or an array, then makes sure it can store the resulting value into v.Kind(), and chooses how to store that resulting value into v.Kind() based on the combination of what it just decoded, and the actual v.Kind(). So there's a bit of a combinatorial explosion here, but it gets the job done.

If all that worked, and we're decoding to a map, we may now need to massage the type of the temporary, find the real key, and store the converted value into the map. That's what lines 830 (if v.Kind() == reflect.Map {) through the final close brace at 867 are about.

¹To find fields, we first look over at encoding/json/encode.go to find cachedTypeFields. It is a caching version of typeFields. This is where the json tags are found and put into a slice. The result is cached via cachedTypeFields in a map indexed by the reflect-type value of the struct type. So what we get is a slow lookup the first time we use a struct type, then a fast lookup afterwards, to get a slice of information about how to do the decoding. This slice-of-information maps from json-tag-or-field name to: field; type; whether it's a sub-field of an anonymous structure; and so on: everything we will need to know to decode it properly—or to encode it, on the encoding side. (I didn't really look closely at this code.)

回答2:

You can encode/decode several message on the same buffer, whether that be a "gob" or "json" or some other encoding.

Assuming there's a limited set of concrete types that you want to support, you can always encode a type tag as the first thing, then encode the actual object. This way the decode can decode the type tag first, and depending on its value, decide how to decode the next item.

// encoder side

enc := json.NewEncoder(buffer) // or gob.NewEncoder(buffer)
enc.Encode("player")
enc.Encode(playerInstance)


// decoder side

dec := json.NewDecoder(buffer) // or gob.NewDecoder(buffer)
var tag string
dec.Decode(&tag)
switch tag {
    case "player":
        var playerInstance Player
        dec.Decode(&player)
        // do something with it
    case "somethingelse":
        // decode something else
}

来源：https://stackoverflow.com/questions/59062330/go-deserialization-when-type-is-not-known

标签

serialization