String.Join performance issue in C#

后端 未结 6 1485
醉梦人生
醉梦人生 2021-01-11 11:51

I\'ve been researching a question that was presented to me: How to write a function that takes a string as input and returns a string with spaces between the characters. Th

相关标签:
6条回答
  • 2021-01-11 12:04

    Your String.Join example works on an IEnumerable<char>. Enumerating an IEnumerable<T> with foreach is often slower than executing a for loop (it depends on the the collection type and other circumstances, as Dave Black pointed out in a comment). Even if Join uses a StringBuilder, the internal buffer of the StringBuilder will have to be increased several times, since the number of items to append is not known in advance.

    0 讨论(0)
  • 2021-01-11 12:05

    When you have passed an IEnumerable to String.Join, it has no idea on how much memory needs to be allocated. I allocates a chunk of memory, resizes it if it is insufficient and repeats the process until it gets enough memory to accommodate all the strings.

    The array version is faster because we know the amount of memory allocated well ahead.

    Also please not that when you are running the 1st version, GC might have occurred.

    0 讨论(0)
  • 2021-01-11 12:11

    The bad performance is not coming from String.Join, but from the way you handle each character. In this case, since characters have to be handled individually, your first method will create much more intermediate strings and the second method suffers from two .Append method calls for each character. Your third method does not involve a lots of intermediate strings or methods calls and that's the reason why your third method is the fastest.

    0 讨论(0)
  • 2021-01-11 12:19

    In your first method, you are using the overload of String.Join that operates on an Enumerable, which requires that the method walk the characters of the string using an enumerator. Internally, this uses a StringBuilder as the exact number of characters is unknown.

    Have you considered using the String.Join overload that takes a string (or string array) instead? That implementation allows a fixed length buffer to be used (similar to your third method) along with some internal unsafe string operations for speed. The call would change to - String.Join(" ", s); Without actually doing the legwork to measure, I would expect this to be on par or faster than your third approach.

    0 讨论(0)
  • 2021-01-11 12:20

    Why is using String.Join so much slower than doing the work by hand?

    The reason String.Join is slower in this case is that you can write an algorithm that has prior knowledge of the exact nature of your IEnumerable<T>.

    String.Join<T>(string, IEnumerable<T>) (the overload you're using), on the other hand, is intended to work with any arbitrary enumerable type, which means it cannot pre-allocate to the proper size. In this case, it's trading flexibility for pure performance and speed.

    Many of the framework methods do handle certain cases where things could be sped up by checking for conditions, but this typically is only done when that "special case" is going to be common.

    In this case, you're effectively creating an edge case where a hand-written routine will be faster, but it is not a common use case of String.Join. In this case, since you know, exactly, in advance what is required, you have the ability to avoid all of the overhead required to have a flexible design by pre-allocating an array of exactly the right size, and building the results manually.

    You'll find that, in general, it's often possible to write a method that will out perform some of the framework routines for specific input data. This is common, as the framework routines have to work with any dataset, which means that you can't optimize for a specific input scenario.

    0 讨论(0)
  • 2021-01-11 12:22

    Since you aren't using the Release build (which should have optimizations checked by default) and/or you're debugging through visual studio then the JITer will be prevented from making a lot of it's optimizations. Because of this you're just not getting a good picture of how long each operation really takes. Once you add in the optimizations you can get the real picture of what's going on.

    It's also important that you not be debugging in visual studio. Go to the bin/release folder and double click the executable entirely outside of visual studio.

    0 讨论(0)
提交回复
热议问题