Can someone please explain why this code is taking so long to run (i.e. >24 hours): The number of rows is 5000, whilst the number of columns is 2000 (i.e. Approximately 10m loop
Yes, the +=
operator is not very efficient. Use StringBuilder
instead.
In the .NET framework a string is immutable, which means it cannot be modified in place. This means the +=
operator has to create a new string every time, which means allocating memory, copying the value of the existing string and writing it to the new location. It's ok for one or two concatenations, but as soon as you put it in a loop you need to use an alternative.
http://support.microsoft.com/kb/306822
You'll see a massive performance improvement by using the following code:
var textToWriteBuilder = new StringBuilder();
for (int i = 0; i < m.rows; i++)
{
for (int j = 0; j < m.cols; j++)
{
textToWriteBuilder.Append(m[i, j].ToString() + ",");
}
// I've modified the logic on the following line, I assume you want to
// concatenate the value instead of overwriting it as you do in your question.
textToWriteBuilder.Append(textToWriteBuilder.Substring(0, textToWriteBuilder.Length - 2));
textToWriteBuilder.Append(Environment.NewLine);
}
string textToWrite = textToWriteBuilder.ToString();
The biggest issue I see with this is the fact you're using textToWrite as a string
.
As strings are immutable so each time the string is changed new memory must be reserved copied from the previous version.
A far more efficient approach is to use the StringBuilder
class which is designed for exactly this type of scenario. For example:
StringBuilder sb = new StringBuilder();
for (int i = 0; i < m.rows; i++)
{
for (int j = 0; j < m.cols; j++)
{
sb.Append(m[i, j].ToString());
if(j < m.cols - 1) // don't add a comma on the last element
{
sb.Append(",");
}
}
sb.AppendLine();
}