Replace a list of invalid character with their valid version (like tr)

后端 未结 4 1106
你的背包
你的背包 2021-01-05 18:26

I need to do something like this dreamed .trReplace:

  str = str.trReplace(\"áéíüñ\",\"aeiu&\");

It should change this str

相关标签:
4条回答
  • 2021-01-05 18:54

    It would be better to use an array of char instead of Stringbuilder. The indexer is faster than calling the Append method, because:

    • push all local variables to the stack
    • move to Append address
    • return to address
    • pop all local variables from the stack

    The example below is about 20 percent faster (depends on your hardware and input string)

    static Dictionary<char, char> mappings;
    public static string TranslateV2(string s)
    {
        var len = s.Length;
        var array = new char[len];
        char c;
    
        for (var index = 0; index < len; index++)
        {
            c = s[index];
            if (mappings.ContainsKey(c))
                array[index] = mappings[c];
            else
                array[index] = c;
        }
    
        return new string(array);
    }
    
    0 讨论(0)
  • 2021-01-05 19:11

    I did something similar for ICAO Passports. The names had to be 'transliterated'. Basically I had a Dictionary of char to char mappings.

    Dictionary<char, char> mappings;
    
    static public string Translate(string s)
    {
       var t = new StringBuilder(s.Length);
       foreach (char c in s)
       {
          char to;
          if (mappings.TryGetValue(c, out to))
             t.Append(to);
          else
             t.Append(c);
        }
        return t.ToString();
     }
    
    0 讨论(0)
  • 2021-01-05 19:14

    Richard has a good answer, but performance may suffer slightly on longer strings (about 25% slower than straight string replace as shown in question). I felt complelled to look in to this a little further. There are actually several good related answers already on StackOverflow as captured below:

    Fastest way to remove chars from string

    C# Stripping / converting one or more characters

    There is also a good article on the CodeProject covering the different options.

    http://www.codeproject.com/KB/string/fastestcscaseinsstringrep.aspx

    To explain why the function provided in Richards answer gets slower with longer strings is due to the fact that the replacements are happening one character at a time; thus if you have large sequences of non-mapped characters, you are wasting extra cycles while re-appending together the string . As such, if you want to take a few points from the CodePlex Article you end up with a slightly modified version of Richards answer that looks like:

    private static readonly Char[] ReplacementChars = new[] { 'á', 'é', 'í', 'ü', 'ñ' };
    private static readonly Dictionary<Char, Char> ReplacementMappings = new Dictionary<Char, Char>
                                                                   {
                                                                     { 'á', 'a'},
                                                                     { 'é', 'e'},
                                                                     { 'í', 'i'},
                                                                     { 'ü', 'u'},
                                                                     { 'ñ', '&'}
                                                                   };
    
    private static string Translate(String source)
    {
      var startIndex = 0;
      var currentIndex = 0;
      var result = new StringBuilder(source.Length);
    
      while ((currentIndex = source.IndexOfAny(ReplacementChars, startIndex)) != -1)
      {
        result.Append(source.Substring(startIndex, currentIndex - startIndex));
        result.Append(ReplacementMappings[source[currentIndex]]);
    
        startIndex = currentIndex + 1;
      }
    
      if (startIndex == 0)
        return source;
    
      result.Append(source.Substring(startIndex));
    
      return result.ToString();
    }
    

    NOTE Not all edge cases have been tested.

    NOTE Could replace ReplacementChars with ReplacementMappings.Keys.ToArray() for a slight cost.

    Assuming that NOT every character is a replacement char, then this will actually run slightly faster than straigt string replacements (again about 20%).

    That being said, remember when considering performance cost, what we are actually talking about... in this case... the difference between the optimized solution and original solution is about 1 second over 100,000 iterations on a 1,000 character string.

    Either way, just wanted to add some information to the answers for this question.

    0 讨论(0)
  • 2021-01-05 19:19

    What you want is a way to go through the string once and do all the replacements. I am not not sure that regex is the best way to do it if you want efficiency. It could very well be that a case switch (for all the characters that you want to replace) in a for loop to test every character is faster. I would profile the two approaches.

    0 讨论(0)
提交回复
热议问题