问题
Writing a file utility to strip out all non-ASCII characters from files. I have this Regex:
Regex rgx = new Regex(@"[^\u0000-\u007F]");
Which works fine. But unfortunatly, I've discovered some silly people use right angles (¬) as delimiters in their files, so these get stripped out as well, but I need those!
I'm pretty new to Regex, and I do understand the basics, but any help would be awesome!
Thanks in advance!
回答1:
You just need to include the code point for the angle bracket in the set:
Try this:
Regex rgx = new Regex(@"[^\uxxxx\u0000-\u007F]");
Or this:
Regex rgx = new Regex(@"[^\uxxxx-\uxxxx\u0000-\u007F]");
(Where xxxx is the Unicode code point for the character you want to preserve.)
The reason for giving two options here is that I know you can specify multiple ranges within one negative character group, but I don't know if you can match individual characters with ranges.
回答2:
Jon's answer is absolutely correct. You may be using the wrong code for the character. Try the following for the similar looking characters:
Regex regex = new Regex(@"([^\u00ac\u0000-\u007F])");
Regex regex = new Regex(@"([^\u02fa\u0000-\u007F])");
Regex regex = new Regex(@"([^\u031a\u0000-\u007F])");
First one should work I think.
来源:https://stackoverflow.com/questions/4183766/replacing-all-non-ascii-characters-except-right-angle-character-in-c-sharp