I am getting following encoded html as a json response and has no idea how to decode it to normal html string, which is an achor tag by the way.
x3ca hrefx3d
This works for me
public static String convertUTF8Units_version2(String input) throws UnsupportedEncodingException
{
return URLDecoder.decode(input.replaceAll("\\\\x", "%"),"UTF-8");
}
That's not an encoding I've seen before, but it looks like xYZ (where Y and Z are hex digits [0-9a-f]
) means "the character whose ascii code is 0xYZ". I'm not sure how the letter x itself would be encoded, so I would recommend trying to find out. But then you can just do a find and replace on the regex x([0-9a-f]{2})
, by getting the integer represented by the two hex numbers, and then casting it to a char
(or something similar to that).
Then also, it looks like slashes (and other characters? See if you can find out...) always have a backslash in front of them, so do another find-and-replace for that.
The term you search for are "UTF8 Code Units". These Code units are basically a backslash, followed by a "x" and a hex ascii code. I wrote a little converter method for you:
public static String convertUTF8Units(String input) {
String part = "", output = input;
for(int i=0;i<=input.length()-4;i++) {
part = input.substring(i, i+4);
if(part.startsWith("\\x")) {
byte[] rawByte = new byte[1];
rawByte[0] = (byte) (Integer.parseInt(part.substring(2), 16) & 0x000000FF);
String raw = new String(rawByte);
output = output.replace(part, raw);
}
}
return output;
}
I know, its a bit frowzy, but it works :)
Thanks!!
Take care, in the for the operator must be "<=" else one character can't be decoded.
for(int i=0;i<=input.length()-4;i++) {..}
Cheers!