What is the fastest way to count newlines in a large .NET string?

前端 未结 6 1828
闹比i
闹比i 2021-02-13 04:18

Is there a way to improve this:

private static int CountNewlines(string s)
{
    int len = s.Length;
    int c = 0;
    for (int i=0; i < len;  i++)
    {
           


        
相关标签:
6条回答
  • 2021-02-13 04:54

    Well, String implements IEnumerable<char>, so I'd definitely try:

    s.Count( c => c == '\n' )
    

    As nice as this looks, the original method is 30x faster :)

    I haven't given up on the IEnumerable yet, so I've also tried:

    int n = 0;
    foreach( var c in s )
    {
        if ( c == '\n' ) n++;
    }
    return n;
    

    which seems as fast as the original method.

    0 讨论(0)
  • 2021-02-13 04:57

    I'm pretty sure this won't be much slower than converting the string to bytes and checking those, if not faster. The String class should be highly optimized.

    If this is a big string, maybe a parallel execution by several threads will make things faster :-)

    0 讨论(0)
  • 2021-02-13 05:02

    you could convert the string to a char array with "ToCharArray();" but i don't think it will improve the performance.. you could try to use unsafe code (pointer) instead of for but well that has its drawbacks to.

    0 讨论(0)
  • 2021-02-13 05:04

    I tested these implementations

    private static int Count1(string s)
    {
        int len = s.Length;
        int c = 0;
        for (int i=0; i < len;  i++)
        {
            if (s[i] == '\n') c++;
        }
        return c+1;
    }
    
    private static int Count2(string s)
    {
        int count = -1;
        int index = -1;
    
        do
        {
            count++;
            index = s.IndexOf('\n', index + 1);
        }
        while (index != -1);
    
        return count+1;
    }
    
    private static int Count3(string s)
    {
        return s.Count( c => c == '\n' ) + 1;
    }
    
    
    private static int Count4(string s)
    {
        int n = 0;
        foreach( var c in s )
        {
            if ( c == '\n' ) n++;
        }
        return n+1;
    }
    
    private static int Count5(string s)
    {
        var a = s.ToCharArray();
        int c = 0;
        for (int i=0; i < a.Length; i++)
        {
            if (a[i]=='\n') c++;
        }
        return c+1;
    }
    

    Here are my timing results for 100000 iterations on a string of ~25k. Lower is faster.

                  Time  Factor
    Count1   4.8581503     1.4
    Count2   4.1406059     1.2
    Count3  45.3614124    13.4
    Count4   3.3896130     1.0
    Count5   5.9304543     1.7
    

    Surprisingly, to me, the Enumerator implementation was fastest for me, by a significant degree - 20% faster than the next closest implementation. The results were repeatable, regardless of the order in which the methods were run. I also used a warmup phase to insure transient effects (jit, etc) were factored out.

    This was for a release build (/optimize+)

    0 讨论(0)
  • 2021-02-13 05:05

    This is probably the most efficient option - the item accessor is internally optimized and you can treat it as if it performs pointer arithmentic.

    0 讨论(0)
  • 2021-02-13 05:12

    Make it an instance method, if you'll use it in a loop.

    0 讨论(0)
提交回复
热议问题