Why is String.Concat not optimized to StringBuilder.Append?

后端 未结 8 1163
小鲜肉
小鲜肉 2020-12-05 07:39

I found concatenations of constant string expressions are optimized by the compiler into one string.

Now with string concatenation of strings only known at run-time,

相关标签:
8条回答
  • 2020-12-05 07:54

    I believe it would be a little too complex for the compiler writers. And when you are referencing the intermediate strings inside the loops besides the concatenation (for example passing them to some other methods or so), this optimization would not be possible.

    0 讨论(0)
  • 2020-12-05 07:57

    Probably because it's complicated to match such a pattern in the code, and in case the compiler can't do the match for some reason, the performance of the code is suddenly terrible. Optimising code like that would encourage writing code like that, which would even further increase the negative impact in the cases where the compiler can no longer do the optimisation.

    For concatenating a known set of strings, StringBuilder is not faster than String.Concat.

    0 讨论(0)
  • 2020-12-05 07:58

    A String is an immutable type, hence using concatenating the string is slower than using StringBuilder.Append.

    Edit: To clarify my point a bit more, when you talk about why is String.Concat not optimized to StringBuilder.Append, a StringBuilder class has completely different semantics to the immutable type of String. Why should you expect the compiler to optimize that as they are clearly two different things? Furthermore, a StringBuilder is a mutable type that can change its length dynamically, why should a compiler optimize an immutable type to a mutable type? That is the design and semantics ingrained into the ECMA spec for the .NET Framework, regardless of the language.

    It's a bit like asking the compiler (and perhaps expecting too much) to compile a char and optimize it into a int because the int works on 32 bits instead of 8 bits and would be deemed faster!

    0 讨论(0)
  • 2020-12-05 08:00

    Two reasons:

    • You can't programmatically identify places where it would be strictly higher performing.
    • The "optimization" will slow things down if performed incorrectly.

    You can suggest people use the correct calls for their application, but at some point it's the developer's responsibility to get it right.

    Edit: Regarding the cutoff, we have another couple of problems:

    • The only way to know for sure that the cutoff is reached is complicated flow analysis. The number of places where this would be able to find sections that could be converted is extremely small.
    • Flow analysis is expensive. If you do it at runtime, the whole program will run slower for the rare chance that one piece of poorly written code will be faster. If you do it at compile time, it's not an error according to language syntax but you can issue a warning - and that's exactly what FXCop does (a slow but available flow analysis tool). Just think if FXCop always had to run with the compiler; so many hours people would be just waiting to run code. And if it was at runtime, well welcome to JVM startup times...
    0 讨论(0)
  • 2020-12-05 08:01

    For a single concatenation of multiple strings (e.g. a + b + c + d + e + f + g + h + i + j) you really want to be using String.Concat IMO. It has the overhead of building an array for each call, but it has the benefit that the method can work out the exact length of the resulting string before it needs to allocate any memory. StringBuilder.Append(a).Append(b)... only gives a single value at a time, so the builder doesn't know how much memory to allocate.

    As for doing it in loops - at that point you've added a new local variable, and you've got to add code to write back to the string variable at exactly the right time (calling StringBuilder.ToString()). What happens when you're running in the debugger? Wouldn't it be pretty confusing not to see the value building up, only becoming visible at the end of the loop? Oh, and of course you've got to perform appropriate validation that the value isn't used at any point before the end of the loop...

    0 讨论(0)
  • 2020-12-05 08:09

    The definite answer will have to come from the compiler design team. But let me take a stab here...

    If your question is, why the compiler doesn't turn this:

    string s = "";
    for( int i = 0; i < 100; i ++ )
        s = string.Concat( s, i.ToString() );
    

    into this:

    StringBuilder sb = new StringBuilder();
    for( int i = 0; i < 100; i++ )
        sb.Append( i.ToString() );
    string s = sb.ToString();
    

    The most likely answer is that this is not an optimization. This is a rewrite of the code that introduces new constructs based on knowledge and intent that the developer has - not the compiler.

    This type of change would require the compiler to have more knowledge of the BCL than is appropriate. What if tomorrow, some more optimal string assembly service becomes available? Should the compiler use that?

    What if your loop conditions were more complicated, should the compiler attempt to perform some static analysis to decide whether the result of such a rewrite would still be functionally equivalent? In many ways, this would be like solving the halting problem.

    Finally, I'm not sure that in all cases this would result in faster performing code. There is a cost to instantiating a StringBuilder and resizing its internal buffer as text is appended. In fact, the cost of appending is strongly tied to the size of the string being concatenated, how many there are, what memory pressure looks like. These are things that the compiler cannot predict in advance.

    It's your job as a developer to write well-performing code. The compiler can only help by making certain safe, invariant-preserving optimizations. Not rewriting your code for you.

    0 讨论(0)
提交回复
热议问题