How to replace special characters with their equivalent (such as “ á ” for “ a”) in C#?

后端 未结 4 1369
悲&欢浪女
悲&欢浪女 2020-12-16 00:45

I need to get the Portuguese text content out of an Excel file and create an xml which is going to be used by an application that doesn\'t support characters such as \"ç\",

相关标签:
4条回答
  • 2020-12-16 01:28
    string text = {text to replace characters in};
    
    Dictionary<char, char> replacements = new Dictionary<char, char>();
    
    // add your characters to the replacements dictionary, 
    // key: char to replace
    // value: replacement char
    
    replacements.Add('ç', 'c');
    ...
    
    System.Text.StringBuilder replaced = new System.Text.StringBuilder();
    for (int i = 0; i < text.Length; i++)
    {
        char character = text[i];
        if (replacements.ContainsKey(character))
        {
            replaced.Append(replacements[character]);
        }
        else
        {
            replaced.Append(character);
        }
    }
    
    // 'replaced' is now your converted text
    
    0 讨论(0)
  • 2020-12-16 01:40

    For future reference, this is exactly what I ended up with:

    temp = stringToConvert.Normalize(NormalizationForm.FormD);
                IEnumerable<char> filtered = temp;
                filtered = filtered.Where(c => char.GetUnicodeCategory(c) != System.Globalization.UnicodeCategory.NonSpacingMark);
                final = new string(filtered.ToArray());
    
    0 讨论(0)
  • 2020-12-16 01:44

    The perform is better with this solution:

    string test = "áéíóúç";
    
    string result = Regex.Replace(test .Normalize(NormalizationForm.FormD), "[^A-Za-z| ]", string.empty);
    
    0 讨论(0)
  • 2020-12-16 01:48

    You could try something like

    var decomposed = "áéö".Normalize(NormalizationForm.FormD);
    var filtered = decomposed.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
    var newString = new String(filtered.ToArray());
    

    This decomposes accents from the text, filters them and creates a new string. Combining diacritics are in the Non spacing mark unicode category.

    0 讨论(0)
提交回复
热议问题