How can you strip non-ASCII characters from a string? (in C#)

前端 未结 11 1041
迷失自我
迷失自我 2020-11-22 17:24

How can you strip non-ASCII characters from a string? (in C#)

相关标签:
11条回答
  • 2020-11-22 17:29

    I believe MonsCamus meant:

    parsememo = Regex.Replace(parsememo, @"[^\u0020-\u007E]", string.Empty);
    
    0 讨论(0)
  • 2020-11-22 17:33

    If you want not to strip, but to actually convert latin accented to non-accented characters, take a look at this question: How do I translate 8bit characters into 7bit characters? (i.e. Ü to U)

    0 讨论(0)
  • 2020-11-22 17:33

    I came here looking for a solution for extended ascii characters, but couldnt find it. The closest I found is bzlm's solution. But that works only for ASCII Code upto 127(obviously you can replace the encoding type in his code, but i think it was a bit complex to understand. Hence, sharing this version). Here's a solution that works for extended ASCII codes i.e. upto 255 which is the ISO 8859-1

    It finds and strips out non-ascii characters(greater than 255)

    Dim str1 as String= "â, ??î or ôu                                                                    
    0 讨论(0)
  • 2020-11-22 17:34

    This is not optimal performance-wise, but a pretty straight-forward Linq approach:

    string strippedString = new string(
        yourString.Where(c => c <= sbyte.MaxValue).ToArray()
        );
    

    The downside is that all the "surviving" characters are first put into an array of type char[] which is then thrown away after the string constructor no longer uses it.

    0 讨论(0)
  • 2020-11-22 17:38

    I used this regex expression:

        string s = "søme string";
        Regex regex = new Regex(@"[^a-zA-Z0-9\s]", (RegexOptions)0);
        return regex.Replace(s, "");
    
    0 讨论(0)
  • 2020-11-22 17:40
    string s = "søme string";
    s = Regex.Replace(s, @"[^\u0000-\u007F]+", string.Empty);
    
    0 讨论(0)
提交回复
热议问题