c# Regex non letter characters from a string

前端 未结 2 938
失恋的感觉
失恋的感觉 2021-01-19 02:20

My terminology may be a little out here, but i am trying to strip out non letters from a string in C#, so remove dashes ampersands etc, but retain things like accented chara

相关标签:
2条回答
  • 2021-01-19 02:30
    string result = string.Concat(input.Where(c => Char.IsLetterOrDigit(c)));
    
    0 讨论(0)
  • 2021-01-19 02:43

    A good starting point would be to remove characters according to their Unicode character class. For example, this code removes everything that is characterized as punctuation, symbol or a control character:

    string input = "I- +AM. 相关 AZURÉE& /30%";
    var output = Regex.Replace(input, "[\\p{S}\\p{C}\\p{P}]", "");
    

    You could also try the whitelisting approach, by only allowing certain classes. For example, this keeps only characters that are letters, diacritics, digits and spacing:

    var output = Regex.Replace(input, "[^\\p{L}\\p{M}\\p{N}\\p{Z}]", "");
    

    See it in action.

    0 讨论(0)
提交回复
热议问题