The following question is more complex than it may first seem.
Assume that I\'ve got an arbitrary JSON object, one that may contain any amount of data including oth
JSON-LD can do normalitzation.
You will have to define your context.
We encountered a simple issue with hashing JSON-encoded payloads. In our case we use the following methodology:
Advantages of using this solution:
Disadvantages
RFC 7638: JSON Web Key (JWK) Thumbprint includes a type of canonicalization. Although RFC7638 expects a limited set of members, we would be able to apply the same calculation for any member.
https://tools.ietf.org/html/rfc7638#section-3
This is the same issue as causes problems with S/MIME signatures and XML signatures. That is, there are multiple equivalent representations of the data to be signed.
For example in JSON:
{ "Name1": "Value1", "Name2": "Value2" }
vs.
{
"Name1": "Value\u0031",
"Name2": "Value\u0032"
}
Or depending on your application, this may even be equivalent:
{
"Name1": "Value\u0031",
"Name2": "Value\u0032",
"Optional": null
}
Canonicalization could solve that problem, but it's a problem you don't need at all.
The easy solution if you have control over the specification is to wrap the object in some sort of container to protect it from being transformed into an "equivalent" but different representation.
I.e. avoid the problem by not signing the "logical" object but signing a particular serialized representation of it instead.
For example, JSON Objects -> UTF-8 Text -> Bytes. Sign the bytes as bytes, then transmit them as bytes e.g. by base64 encoding. Since you are signing the bytes, differences like whitespace are part of what is signed.
Instead of trying to do this:
{
"JSONContent": { "Name1": "Value1", "Name2": "Value2" },
"Signature": "asdflkajsdrliuejadceaageaetge="
}
Just do this:
{
"Base64JSONContent": "eyAgIk5hbWUxIjogIlZhbHVlMSIsICJOYW1lMiI6ICJWYWx1ZTIiIH0s",
"Signature": "asdflkajsdrliuejadceaageaetge="
}
I.e. don't sign the JSON, sign the bytes of the encoded JSON.
Yes, it means the signature is no longer transparent.
I would do all fields in a given order (alphabetically for example). Why does arbitrary data make a difference? You can just iterate over the properties (ala reflection).
Alternatively, I would look into converting the raw json string into some well defined canonical form (remove all superflous formatting) - and hashing that.
Instead of inventing your own JSON normalization/canonicalization you may want to use bencode. Semantically it's the same as JSON (composition of numbers, strings, lists and dicts), but with the property of unambiguous encoding that is necessary for cryptographic hashing.
bencode is used as a torrent file format, every bittorrent client contains an implementation.