So your file contains the verbatim string
\u5b89\u5fbd\u5b5f\u5143
in ASCII and not the string represented by those four Unicode codepoints in some given encoding?
As it happens, I just wrote some code in C# that can parse strings in this format for a JSON parser project -- here's a variant that only handles \uXXXX escapes:
private static string ReadSlashedString(TextReader reader) {
var sb = new StringBuilder(32);
bool q = false;
while (true) {
int chrR = reader.Read();
if (chrR == -1) break;
var chr = (char) chrR;
if (!q) {
if (chr == '\\') {
q = true;
continue;
}
sb.Append(chr);
}
else {
switch (chr) {
case 'u':
case 'U':
var hexb = new char[4];
reader.Read(hexb, 0, 4);
chr = (char) Convert.ToInt32(new string(hexb), 16);
sb.Append(chr);
break;
default:
throw new Exception("Invalid backslash escape (\\ + charcode " + (int) chr + ")");
}
q = false;
}
}
return sb.ToString();
}
and you could use it like
var str = ReadSlashedString(new StringReader("\\u5b89\\u5fbd\\u5b5f\\u5143"));
(or using a StreamReader
to read from a file).
Hope this helps!
EDIT: @Darin Dimitrov's regexp-utilizing answer is probably faster, but I happened to have this code at hand. :)