dotnet core System.Text.Json unescape unicode string

后端 未结 2 1682
抹茶落季
抹茶落季 2020-11-29 10:56

Using JsonSerializer.Serialize(obj) will produce an escaped string, but I want the unescaped version. For example:



        
相关标签:
2条回答
  • 2020-11-29 11:37

    To change the escaping behavior of the JsonSerializer you can pass in a custom JavascriptEncoder to the JsonSerializer by setting the Encoder property on the JsonSerializerOptions.

    https://docs.microsoft.com/en-us/dotnet/api/system.text.json.jsonserializeroptions.encoder?view=netcore-3.0#System_Text_Json_JsonSerializerOptions_Encoder

    The default behavior is designed with security in mind and the JsonSerializer over-escapes for defense-in-depth.

    If all you are looking for is escaping certain "alphanumeric" characters of a specific non-latin language, I would recommend that you instead create a JavascriptEncoder using the Create factory method rather than using the UnsafeRelaxedJsonEscaping encoder.

    JsonSerializerOptions options = new JsonSerializerOptions
    {
        Encoder = JavaScriptEncoder.Create(UnicodeRanges.BasicLatin, UnicodeRanges.CjkUnifiedIdeographs)
    };
    
    var a = new A { Name = "你好" };
    var s = JsonSerializer.Serialize(a, options);
    Console.WriteLine(s);
    

    Doing so keeps certain safe-guards, for instance, HTML-sensitive characters will continue to be escaped.

    I would caution against using System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping flippantly since it does minimal escaping (which is why it has "unsafe" in the name). If the JSON you are creating is written to a UTF-8 encoded file on disk or if its part of web request which explicitly sets the charset to utf-8 (and is not going to potentially be embedded within an HTML component as is), then it is probably OK to use this.

    See the remarks section within the API docs: https://docs.microsoft.com/en-us/dotnet/api/system.text.encodings.web.javascriptencoder.unsaferelaxedjsonescaping?view=netcore-3.0#remarks

    You could also consider specifying UnicodeRanges.All if you expect/need all languages to remain un-escaped. This still escapes certain ASCII characters that are prone to security vulnerabilities.

    JsonSerializerOptions options = new JsonSerializerOptions
    {
        Encoder = JavaScriptEncoder.Create(UnicodeRanges.All)
    };
    

    For more information and code samples, see: https://docs.microsoft.com/en-us/dotnet/standard/serialization/system-text-json-how-to?view=netcore-3.0#customize-character-encoding

    See the Caution Note

    0 讨论(0)
  • 2020-11-29 11:39

    You need to set the JsonSerializer options not to encode those strings.

    JsonSerializerOptions jso = new JsonSerializerOptions();
    jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;
    

    Then you pass this options when you call your Serialize method.

    var s = JsonSerializer.Serialize(a, jso);        
    

    Full code:

    JsonSerializerOptions jso = new JsonSerializerOptions();
    jso.Encoder = System.Text.Encodings.Web.JavaScriptEncoder.UnsafeRelaxedJsonEscaping;
    
    var a = new A { Name = "你好" };
    var s = JsonSerializer.Serialize(a, jso);        
    Console.WriteLine(s);
    

    Result:

    If you need to print the result in the console, you may need to install additional language. Please refer here.

    0 讨论(0)
提交回复
热议问题