Json parsing with unicode characters

后端 未结 6 1059
太阳男子
太阳男子 2020-12-21 09:22

i have a json file with unicode characters, and i\'m having trouble to parse it. I\'ve tried in Flash CS5, the JSON library, and i have tried it in http://json.parser.online

相关标签:
6条回答
  • 2020-12-21 09:50

    In asp.net you would think you would use System.Text.Encoding to convert a string like "Paul\u0027s" back to a string like "Paul's" but i tried for hours and found nothing that worked.

    The trouble is hardcoding a string as shown above already decodes the string as you will see if you put a break point on it so in the end i wrote a function that converts the Hex27 to Dec39 so that i ended up with HTML encodeing and then decoded that.

     string Padding = "000";
                    for (int f = 1; f <= 256; f++)
                    {
                        string Hex = "\\u" + Padding.Substring(0, 4 - f.ToString().Length) + f;
                        string Dec = "&#" + Int32.Parse(f.ToString(), NumberStyles.HexNumber) + ";";
                        HTML = HTML.Replace(Hex, Dec);
                    }
                    HTML = System.Web.HttpUtility.HtmlDecode(HTML);
    

    Ugly as sin, I know but without using the latest framework (Not on ISP's server) it was the best I could do and someone must know a better solution.

    0 讨论(0)
  • 2020-12-21 09:53

    I had the same problem with Twitter json files. I was parsing them in Python with json.loads(tweet) but it failed for half of the records.

    I changed to Python3 and it works well now.

    0 讨论(0)
  • Quoth the RFC:

    JSON text SHALL be encoded in Unicode. The default encoding is UTF-8.

    So a correctly encoded Unicode character should not be a problem. Which leads me to believe that it's not correctly encoded (maybe it uses latin-1 instead of UTF-8). How did you create the file? In a text editor?

    0 讨论(0)
  • 2020-12-21 10:02

    I had the same problem and I just change the file encoding type Mac-Roman/windows-1252 to UTF-8.. and it worked

    0 讨论(0)
  • 2020-12-21 10:05

    There might be an obscure Unicode whitespace character hidden in your string.

    This URL contains more detail:

    http://timelessrepo.com/json-isnt-a-javascript-subset

    0 讨论(0)
  • 2020-12-21 10:06

    If you seem to have trouble with the encoding of a JSON file (i.e. escaped codes such as \u00fc aren't displayed correctly regardless of your editor's encoding setting) generated by Python with json.dump s(): it encodes ASCII by default and escapes the unicode characters! See python json unicode - how do I eval using javascript (and python: json.dumps can't handle utf-8? and Why does json.dumps escape non-ascii characters with "\uxxxx").

    0 讨论(0)
提交回复
热议问题