fast custom string splitting

后端 未结 3 978
离开以前
离开以前 2021-01-06 17:25

I am writing a custom string split. It will split on a dot(.) that is not preceded by an odd number of backslashes (\\).

«string» -         


        
相关标签:
3条回答
  • 2021-01-06 18:09

    You shouldn't try to use string.Split for that.

    If you need help to implement it, a simple way to solve this is to have loop that scans the string, keeping track of the last place where you found a qualifying dot. When you find a new qualifying dot (or reach the end of the input string), just yield return the current substring.

    Edit: about returning a list or an array vs. using yield

    If in your application, the most important thing is the time spent by the caller on iterating the substrings, then you should populate a list or an array and return that, as suggested in the accepted question. I would not use a resizable array while collecting the substrings because this would be slow.

    On the other hand, if you care about the overall performance and about memory, and if sometimes the caller doesn't have to iterate over the entire list, you should use yield return. When you use yield return, you have the advantage that no code at all is executing until the caller has called MoveNext (directly or indirectly through a foreach). This means that you save the memory for allocating the array/list, and you save the time spent on allocating/resizing/populating the list. You will be spending time almost only on the logic of finding the substrings, and this will be done lazily, that is - only when actually needed because the caller continues to iterate the substrings.

    0 讨论(0)
  • 2021-01-06 18:18

    The loop that you show is the right approach if you need performance. (Regex wouldn't be).

    Switch to an index-based for-loop. Remember the index of the start of the match. Don't append individual chars. Instead, remember the range of characters to copy out and do that with a single Substring call per item.

    Also, don't use a LinkedList. It is slower than a List for almost all cases except random-access mutations.

    You might also switch from List to a normal array that you resize with Array.Resize. This results in slightly tedious code (because you have inlined a part of the List class into your method) but it cuts out some small overheads.

    Next, don't return an IEnumerable because that forces the caller through indirection when accessing its items. Return a List or an array.

    0 讨论(0)
  • 2021-01-06 18:26

    This is the one I eventually settled on. It is not as fast a string.split, but good enough and can be modified, to do what I want.

        private IEnumerable<string> HandMadeSplit2b(string input)
        {
            //this one is margenaly better that the second best 2, but makes the resolver (its client much faster), nealy as fast as original.
            var Result = new List<string>();
            var begining = 0;
            var len = input.Length;
            for (var index=0;index<len;index++)
            {
                if (input[index] == '.')
                {
                    Result.Add(input.Substring(begining,index-begining));
                    begining = index+1;
                }
            }
            Result.Add(input.Substring(begining));
            return Result;
        }
    
    0 讨论(0)
提交回复
热议问题