Convert a string containing ASCII to Unicode

后端 未结 2 1412
心在旅途
心在旅途 2021-01-25 14:28

I get a string from my HTML page into my Java HTTPServlet. On my request I get ASCII codes that display Chinese characters:

\"& #21487;& #20197;& #21578;&

2条回答
  •  执笔经年
    2021-01-25 14:55

    There is no such thing as ASCII codes that display Chinese characters. ASCII does not represent Chinese characters.

    If you already have a Java string, it already has an internal representation of all characters (US, LATIN, CHINESE). You can then encode that Java string into Unicode using UTF-8 or UTF-16 representations:

    String s = "可以告诉我"; (EDIT: This line won't display correctly on systems not having fonts for Chinese characters)

    String s = "\u53ef\u4ee5\u544a\u8bc9\u6211";
    byte utfString = s.getBytes("UTF-8");
    

    Now that I look at your updated question, you might be looking for the StringEscapeUtils class. It's from Apache Commons Text. And will unescape your HTML entities into a Java string:

    String s = StringEscapeUtils.unescapeHtml("& #21487;& #20197;& #21578;& #35785;& #25105;"); // without spaces
    

提交回复
热议问题