keyboardP’s solution is decent – do consider it. But as I’ve argued in the comments, regular expressions are actually the correct tool for the job, you’re just making it unnecessarily complicated. The actual solution is a one-liner:
var result = Regex.Replace(input, "\\P{L}", "");
\P{…} specifies a Unicode character class we do not want to match (the opposite of \p{…}
). L
is the Unicode character class for letters.
Of course it makes sense to encapsulate this into a method, as keyboardP did. To avoid recompiling the regular expression over again, you should also consider pulling the regex creation out of the actual code (although this probably won’t give a big impact on performance):
static readonly Regex nonUnicodeRx = new Regex("\\P{L}");
public static string RemoveNonUnicodeLetters(string input) {
return nonUnicodeRx.Replace(input, "");
}