问题
Feed burner changed their blog service return results that it returns blocks of javascript similar to:
document.write("\x3cdiv class\x3d\x22feedburnerFeedBlock\x22 id\x3d\x22RitterInsuranceMarketingRSSv3iugf6igask14fl8ok645b6l0\x22\x3e"); document.write("\x3cul\x3e"); document.write("\x3cli\x3e\x3cspan class\x3d\x22headline\x22\x3e\x3ca href\x3d\x22
I want the raw html out of this. Previously I was able to easily just use .Replace to cleave out the document.write syntax but I can't figure out what kind of encoding this is or atleast how to decode it with C#.
Edit: Well this was a semi-nightmare to finally solve, here's what I came up with incase anyone has any improvements to offer
public static char ConvertHexToASCII(this string hex)
{
if (hex == null) throw new ArgumentNullException(hex);
return (char)Convert.ToByte(hex, 16);
}
.
private string DecodeFeedburnerHtml(string html)
{
var builder = new StringBuilder(html.Length);
var stack = new Stack<char>(4);
foreach (var chr in html)
{
switch (chr)
{
case '\\':
if (stack.Count == 0)
{
stack.Push(chr);
}
else
{
stack.Clear();
builder.Append(chr);
}
break;
case 'x':
if (stack.Count == 1)
{
stack.Push(chr);
}
else
{
stack.Clear();
builder.Append(chr);
}
break;
default:
if (stack.Count >= 2)
{
stack.Push(chr);
if (stack.Count == 4)
{
//get stack[3]stack[4]
string hexString = string.Format("{1}{0}", stack.Pop(),
stack.Pop());
builder.Append(hexString.ConvertHexToASCII());
stack.Clear();
}
}
else
{
builder.Append(chr);
}
break;
}
}
html = builder.ToString();
return html;
}
Not sure what else I could do better. For some reason code like this always feels really dirty to me even though it's a linear time algorithm I guess this is related to how long it has to be.
回答1:
Those look like ASCII values, encoded in hex. You could traverse the string, and whenever you find a \x
followed by two hexadecimal digits (0-9,a-f), replace it with the corresponding ASCII character. If the string is long, it would be faster to save the result incrementally to a StringBuilder
instead of using String.Replace()
.
I don't know the encoding specification, but there might be more rules to follow (for example, if \\
is an escape character for a literal \
).
回答2:
In dotnet core you can use Uri.UnescapeDataString(originalString.Replace("\x","%")) to convert it by making it into a Url encoded string first.
回答3:
That is a PHP Twig encoding:
http://www.twig-project.org/
Since you are using C# you will most likely have to create a dictionary to translate the symbols and then use a series of .Replace()
string methods to convert those back to HTML characters.
Alternatively you can save that data to a file, run a Perl script to decode the text and then read from the file in C#, but that might be more costly.
来源:https://stackoverflow.com/questions/4454569/how-to-decode-feedburner-result-containing-x3c-and-so-on