Replacing all non-ASCII characters, except right angle character in C#

♀尐吖头ヾ 提交于 2019-12-08 08:09:47

问题


Writing a file utility to strip out all non-ASCII characters from files. I have this Regex:

Regex rgx = new Regex(@"[^\u0000-\u007F]");

Which works fine. But unfortunatly, I've discovered some silly people use right angles (¬) as delimiters in their files, so these get stripped out as well, but I need those!

I'm pretty new to Regex, and I do understand the basics, but any help would be awesome!

Thanks in advance!


回答1:


You just need to include the code point for the angle bracket in the set:

Try this:

Regex rgx = new Regex(@"[^\uxxxx\u0000-\u007F]");

Or this:

Regex rgx = new Regex(@"[^\uxxxx-\uxxxx\u0000-\u007F]");

(Where xxxx is the Unicode code point for the character you want to preserve.)

The reason for giving two options here is that I know you can specify multiple ranges within one negative character group, but I don't know if you can match individual characters with ranges.




回答2:


Jon's answer is absolutely correct. You may be using the wrong code for the character. Try the following for the similar looking characters:

Regex regex = new Regex(@"([^\u00ac\u0000-\u007F])");
Regex regex = new Regex(@"([^\u02fa\u0000-\u007F])");
Regex regex = new Regex(@"([^\u031a\u0000-\u007F])");

First one should work I think.



来源:https://stackoverflow.com/questions/4183766/replacing-all-non-ascii-characters-except-right-angle-character-in-c-sharp

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!