String Benchmarks in C# - Refactoring for Speed/Maintainability

后端 未结 10 1413
无人及你
无人及你 2021-02-09 21:29

I\'ve been tinkering with small functions on my own time, trying to find ways to refactor them (I recently read Martin Fowler\'s book Refactoring: Improving the Design of Ex

相关标签:
10条回答
  • 2021-02-09 21:44

    The Regex versions of your solution are not equivalent in results to the original code. Perhaps the larger context of the code avoids the areas where they differ. The original code will add a space for anything that is not a lower case character. For example "This\tIsATest" would become "This \t Is A Test" in the original but "This\t Is A Test" with the Regex versions.

    (?<!^)(?=[^a-z])
    

    Is the pattern you want for a closer match, but even then it is still ignoring issues of i18n. The following pattern should take care of that:

    (?<!^)(?=\P{Ll})
    
    0 讨论(0)
  • 2021-02-09 21:45

    Use a StringBuilder instead of concatenation. Each concatenation is creating a new string instance and throwing away the old.

    0 讨论(0)
  • 2021-02-09 21:48

    Here is a slightly more optimal version. I have taken suggestions from previous posters, but also appended to the string builder in a block-wise fashion. This may allow string builder to copy 4 bytes at a time, depending on the size of the word. I have also removed the string allocation and just replace it by str.length.

        static string RefactoredMakeNiceString2(string str)
        {
            char[] ca = str.ToCharArray();
            StringBuilder sb = new StringBuilder(str.Length);
            int start = 0;
            for (int i = 0; i < ca.Length; i++)
            {
                if (char.IsUpper(ca[i]) && i != 0)
                {
                    sb.Append(ca, start, i - start);
                    sb.Append(' ');
                    start = i;
                }
            }
            sb.Append(ca, start, ca.Length - start);
            return sb.ToString();
        }
    
    0 讨论(0)
  • 2021-02-09 21:54

    1) Use a StringBuilder, preferrably set with a reasonable initial capacity (e.g. string length * 5/4, to allow one extra space per four characters).

    2) Try using a foreach loop instead of a for loop - it may well be simpler

    3) You don't need to convert the string into a char array first - foreach will work over a string already, or use the indexer.

    4) Don't do extra string conversions everywhere - calling Convert.ToString(char) and then appending that string is pointless; there's no need for the single character string

    5) For the second option, just build the regex once, outside the method. Try it with RegexOptions.Compiled as well.

    EDIT: Okay, full benchmark results. I've tried a few more things, and also executed the code with rather more iterations to get a more accurate result. This is only running on an Eee PC, so no doubt it'll run faster on "real" PCs, but I suspect the broad results are appropriate. First the code:

    using System;
    using System.Collections;
    using System.Collections.Generic;
    using System.Diagnostics;
    using System.Linq;
    using System.Reflection;
    using System.Text;
    using System.Text.RegularExpressions;
    
    class Benchmark
    {
        const string TestData = "ThisIsAUpperCaseString";
        const string ValidResult = "This Is A Upper Case String";
        const int Iterations = 1000000;
    
        static void Main(string[] args)
        {
            Test(BenchmarkOverhead);
            Test(MakeNiceString);
            Test(ImprovedMakeNiceString);
            Test(RefactoredMakeNiceString);
            Test(MakeNiceStringWithStringIndexer);
            Test(MakeNiceStringWithForeach);
            Test(MakeNiceStringWithForeachAndLinqSkip);
            Test(MakeNiceStringWithForeachAndCustomSkip);
            Test(SplitCamelCase);
            Test(SplitCamelCaseCachedRegex);
            Test(SplitCamelCaseCompiledRegex);        
        }
    
        static void Test(Func<string,string> function)
        {
            Console.Write("{0}... ", function.Method.Name);
            Stopwatch sw = Stopwatch.StartNew();
            for (int i=0; i < Iterations; i++)
            {
                string result = function(TestData);
                if (result.Length != ValidResult.Length)
                {
                    throw new Exception("Bad result: " + result);
                }
            }
            sw.Stop();
            Console.WriteLine(" {0}ms", sw.ElapsedMilliseconds);
            GC.Collect();
        }
    
        private static string BenchmarkOverhead(string str)
        {
            return ValidResult;
        }
    
        private static string MakeNiceString(string str)
        {
            char[] ca = str.ToCharArray();
            string result = null;
            int i = 0;
            result += System.Convert.ToString(ca[0]);
            for (i = 1; i <= ca.Length - 1; i++)
            {
                if (!(char.IsLower(ca[i])))
                {
                    result += " ";
                }
                result += System.Convert.ToString(ca[i]);
            }
            return result;
        }
    
        private static string ImprovedMakeNiceString(string str)
        { //Removed Convert.ToString()
            char[] ca = str.ToCharArray();
            string result = null;
            int i = 0;
            result += ca[0];
            for (i = 1; i <= ca.Length - 1; i++)
            {
                if (!(char.IsLower(ca[i])))
                {
                    result += " ";
                }
                result += ca[i];
            }
            return result;
        }
    
        private static string RefactoredMakeNiceString(string str)
        {
            char[] ca = str.ToCharArray();
            StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
            int i = 0;
            sb.Append(ca[0]);
            for (i = 1; i <= ca.Length - 1; i++)
            {
                if (!(char.IsLower(ca[i])))
                {
                    sb.Append(" ");
                }
                sb.Append(ca[i]);
            }
            return sb.ToString();
        }
    
        private static string MakeNiceStringWithStringIndexer(string str)
        {
            StringBuilder sb = new StringBuilder((str.Length * 5 / 4));
            sb.Append(str[0]);
            for (int i = 1; i < str.Length; i++)
            {
                char c = str[i];
                if (!(char.IsLower(c)))
                {
                    sb.Append(" ");
                }
                sb.Append(c);
            }
            return sb.ToString();
        }
    
        private static string MakeNiceStringWithForeach(string str)
        {
            StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
            bool first = true;      
            foreach (char c in str)
            {
                if (!first && char.IsUpper(c))
                {
                    sb.Append(" ");
                }
                sb.Append(c);
                first = false;
            }
            return sb.ToString();
        }
    
        private static string MakeNiceStringWithForeachAndLinqSkip(string str)
        {
            StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
            sb.Append(str[0]);
            foreach (char c in str.Skip(1))
            {
                if (char.IsUpper(c))
                {
                    sb.Append(" ");
                }
                sb.Append(c);
            }
            return sb.ToString();
        }
    
        private static string MakeNiceStringWithForeachAndCustomSkip(string str)
        {
            StringBuilder sb = new StringBuilder(str.Length * 5 / 4);
            sb.Append(str[0]);
            foreach (char c in new SkipEnumerable<char>(str, 1))
            {
                if (char.IsUpper(c))
                {
                    sb.Append(" ");
                }
                sb.Append(c);
            }
            return sb.ToString();
        }
    
        private static string SplitCamelCase(string str)
        {
            string[] temp = Regex.Split(str, @"(?<!^)(?=[A-Z])");
            string result = String.Join(" ", temp);
            return result;
        }
    
        private static readonly Regex CachedRegex = new Regex("(?<!^)(?=[A-Z])");    
        private static string SplitCamelCaseCachedRegex(string str)
        {
            string[] temp = CachedRegex.Split(str);
            string result = String.Join(" ", temp);
            return result;
        }
    
        private static readonly Regex CompiledRegex =
            new Regex("(?<!^)(?=[A-Z])", RegexOptions.Compiled);    
        private static string SplitCamelCaseCompiledRegex(string str)
        {
            string[] temp = CompiledRegex.Split(str);
            string result = String.Join(" ", temp);
            return result;
        }
    
        private class SkipEnumerable<T> : IEnumerable<T>
        {
            private readonly IEnumerable<T> original;
            private readonly int skip;
    
            public SkipEnumerable(IEnumerable<T> original, int skip)
            {
                this.original = original;
                this.skip = skip;
            }
    
            public IEnumerator<T> GetEnumerator()
            {
                IEnumerator<T> ret = original.GetEnumerator();
                for (int i=0; i < skip; i++)
                {
                    ret.MoveNext();
                }
                return ret;
            }
    
            IEnumerator IEnumerable.GetEnumerator()
            {
                return GetEnumerator();
            }
        }
    }
    

    Now the results:

    BenchmarkOverhead...  22ms
    MakeNiceString...  10062ms
    ImprovedMakeNiceString...  12367ms
    RefactoredMakeNiceString...  3489ms
    MakeNiceStringWithStringIndexer...  3115ms
    MakeNiceStringWithForeach...  3292ms
    MakeNiceStringWithForeachAndLinqSkip...  5702ms
    MakeNiceStringWithForeachAndCustomSkip...  4490ms
    SplitCamelCase...  68267ms
    SplitCamelCaseCachedRegex...  52529ms
    SplitCamelCaseCompiledRegex...  26806ms
    

    As you can see, the string indexer version is the winner - it's also pretty simple code.

    Hope this helps... and don't forget, there are bound to be other options I haven't thought of!

    0 讨论(0)
  • 2021-02-09 21:54

    You might want to try instantiating a Regex object as a class member and using the RegexOptions.Compiled option when you create it.

    Currently, you're using the static Split member of Regex, and that doesn't cache the regular expression. Using an instanced member object instead of the static method should improve your performance even more (over the long run).

    0 讨论(0)
  • 2021-02-09 21:55

    My first refactoring would be to change the name of the method to something more descriptive. MakeNiceString imo is not a name that would indicate to me what this method does.

    How about PascalCaseToSentence? Not loving that name, but it's better than MakeNiceString.

    0 讨论(0)
提交回复
热议问题