What\'s the most efficient way to concatenate strings?
Here is the fastest method I've evolved over a decade for my large-scale NLP app. I have variations for IEnumerable
and other input types, with and without separators of different types (Char
, String
), but here I show the simple case of concatenating all strings in an array into a single string, with no separator. Latest version here is developed and unit-tested on C# 7 and .NET 4.7.
There are two keys to higher performance; the first is to pre-compute the exact total size required. This step is trivial when the input is an array as shown here. For handling IEnumerable
instead, it is worth first gathering the strings into a temporary array for computing that total (The array is required to avoid calling ToString()
more than once per element since technically, given the possibility of side-effects, doing so could change the expected semantics of a 'string join' operation).
Next, given the total allocation size of the final string, the biggest boost in performance is gained by building the result string in-place. Doing this requires the (perhaps controversial) technique of temporarily suspending the immutability of a new String
which is initially allocated full of zeros. Any such controversy aside, however...
...note that this is the only bulk-concatenation solution on this page which entirely avoids an extra round of allocation and copying by the
String
constructor.
Complete code:
///
/// Concatenate the strings in 'rg', none of which may be null, into a single String.
///
public static unsafe String StringJoin(this String[] rg)
{
int i;
if (rg == null || (i = rg.Length) == 0)
return String.Empty;
if (i == 1)
return rg[0];
String s, t;
int cch = 0;
do
cch += rg[--i].Length;
while (i > 0);
if (cch == 0)
return String.Empty;
i = rg.Length;
fixed (Char* _p = (s = new String(default(Char), cch)))
{
Char* pDst = _p + cch;
do
if ((t = rg[--i]).Length > 0)
fixed (Char* pSrc = t)
memcpy(pDst -= t.Length, pSrc, (UIntPtr)(t.Length << 1));
while (pDst > _p);
}
return s;
}
[DllImport("MSVCR120_CLR0400", CallingConvention = CallingConvention.Cdecl)]
static extern unsafe void* memcpy(void* dest, void* src, UIntPtr cb);
I should mention that this code has a slight modification from what I use myself. In the original, I call the cpblk IL instruction from C# to do the actual copying. For simplicity and portability in the code here, I replaced that with P/Invoke memcpy
instead, as you can see. For highest performance on x64 (but maybe not x86) you may want to use the cpblk method instead.