问题
I have a string that looks like this:
{\x22documentReferer\x22:\x22http:\x5C/\x5C/pikabu.ru\x5C/freshitems.php\x22}
How could I convert this into a readable JSON?
I've found different slow solutions like here with regEx
Have already tried:
URL.decode
StringEscapeUtils
JSON.parse // from different libraries
For example python has simple solution like decode from 'string_escape'
Linked possible duplicate applies to Python, and my question is about Java or Scala
Working but also very slow solution I'm using now is from here:
def unescape(oldstr: String): String = {
val newstr = new StringBuilder(oldstr.length)
var saw_backslash = false
var i = 0
while (i < oldstr.length) {
{
val cp = oldstr.codePointAt(i)
if (!saw_backslash) {
if (cp == '\\') saw_backslash = true
else newstr.append(cp.toChar)
} else {
if (cp == '\\') {
saw_backslash = false
newstr.append('\\')
newstr.append('\\')
} else {
if (cp == 'x') {
if (i + 2 > oldstr.length) die("string too short for \\x escape")
i += 1
var value = 0
try
value = Integer.parseInt(oldstr.substring(i, i + 2), 16)
catch {
case nfe: NumberFormatException =>
die("invalid hex value for \\x escape")
}
newstr.append(value.toChar)
i += 1
}
else {
newstr.append('\\')
newstr.append(cp.toChar)
}
saw_backslash = false
}
}
}
i += 1
}
if (saw_backslash) newstr.append('\\')
newstr.toString
}
private def die(msg: String) {
throw new IllegalArgumentException(msg)
}
回答1:
\x
is used to escape ASCII characters in Python and other languages. In Scala and Java, you can use \u
to escape Unicode characters. Since ASCII is a subset of Unicode (as explained here), we can use the unescapeJava
method (in StringEscapeUtils
) along with some simple replacement to add the \u
escape character together with 2 leading zeros:
import org.apache.commons.lang3.StringEscapeUtils
StringEscapeUtils.unescapeJava(x.replaceAll("""\\x""", """\\u00"""))
You can also use regex to find the escape sequences and replace them with the appropriate ASCII character:
val pattern = """\\x([0-9A-F]{2})""".r
pattern.replaceAllIn(x, m => m.group(1) match {
case "5C" => """\\""" //special case for backslash
case hex => Integer.parseInt(hex, 16).toChar.toString
})
This appears to be faster and does not require an external library, although it is still may be slow for your needs. It probably also does not cover some edge cases, but might cover simple needs.
I am definitely not an expert on this so there might be a better way to handle this.
来源:https://stackoverflow.com/questions/47032049/java-or-scala-how-to-convert-characters-like-x22-into-string