How can you strip non-ASCII characters from a string? (in C#)
I believe MonsCamus meant:
parsememo = Regex.Replace(parsememo, @"[^\u0020-\u007E]", string.Empty);
If you want not to strip, but to actually convert latin accented to non-accented characters, take a look at this question: How do I translate 8bit characters into 7bit characters? (i.e. Ü to U)
I came here looking for a solution for extended ascii characters, but couldnt find it. The closest I found is bzlm's solution. But that works only for ASCII Code upto 127(obviously you can replace the encoding type in his code, but i think it was a bit complex to understand. Hence, sharing this version). Here's a solution that works for extended ASCII codes i.e. upto 255 which is the ISO 8859-1
It finds and strips out non-ascii characters(greater than 255)
Dim str1 as String= "â, ??î or ôu
This is not optimal performance-wise, but a pretty straight-forward Linq approach:
string strippedString = new string(
yourString.Where(c => c <= sbyte.MaxValue).ToArray()
);
The downside is that all the "surviving" characters are first put into an array of type char[]
which is then thrown away after the string
constructor no longer uses it.
I used this regex expression:
string s = "søme string";
Regex regex = new Regex(@"[^a-zA-Z0-9\s]", (RegexOptions)0);
return regex.Replace(s, "");
string s = "søme string";
s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);