Given
2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,"Corvallis, OR",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34
I see that if you paste csv delimited text in Excel and do a "Text to Columns", it asks you for a "text qualifier". It's defaulted to a double quote so that it treats text within double quotes as literal. I imagine that Excel implements this by going one character at a time, if it encounters a "text qualifier", it keeps going to the next "qualifier". You can probably implement this yourself with a for loop and a boolean to denote if you're inside literal text.
public string[] CsvParser(string csvText)
{
List<string> tokens = new List<string>();
int last = -1;
int current = 0;
bool inText = false;
while(current < csvText.Length)
{
switch(csvText[current])
{
case '"':
inText = !inText; break;
case ',':
if (!inText)
{
tokens.Add(csvText.Substring(last + 1, (current - last)).Trim(' ', ','));
last = current;
}
break;
default:
break;
}
current++;
}
if (last != csvText.Length - 1)
{
tokens.Add(csvText.Substring(last+1).Trim());
}
return tokens.ToArray();
}
I had a problem with a CSV that contains fields with a quote character in them, so using the TextFieldParser, I came up with the following:
private static string[] parseCSVLine(string csvLine)
{
using (TextFieldParser TFP = new TextFieldParser(new MemoryStream(Encoding.UTF8.GetBytes(csvLine))))
{
TFP.HasFieldsEnclosedInQuotes = true;
TFP.SetDelimiters(",");
try
{
return TFP.ReadFields();
}
catch (MalformedLineException)
{
StringBuilder m_sbLine = new StringBuilder();
for (int i = 0; i < TFP.ErrorLine.Length; i++)
{
if (i > 0 && TFP.ErrorLine[i]== '"' &&(TFP.ErrorLine[i + 1] != ',' && TFP.ErrorLine[i - 1] != ','))
m_sbLine.Append("\"\"");
else
m_sbLine.Append(TFP.ErrorLine[i]);
}
return parseCSVLine(m_sbLine.ToString());
}
}
}
A StreamReader is still used to read the CSV line by line, as follows:
using(StreamReader SR = new StreamReader(FileName))
{
while (SR.Peek() >-1)
myStringArray = parseCSVLine(SR.ReadLine());
}
Use the Microsoft.VisualBasic.FileIO.TextFieldParser
class. This will handle parsing a delimited file, TextReader
or Stream
where some fields are enclosed in quotes and some are not.
For example:
using Microsoft.VisualBasic.FileIO;
string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
TextFieldParser parser = new TextFieldParser(new StringReader(csv));
// You can also read from a file
// TextFieldParser parser = new TextFieldParser("mycsvfile.csv");
parser.HasFieldsEnclosedInQuotes = true;
parser.SetDelimiters(",");
string[] fields;
while (!parser.EndOfData)
{
fields = parser.ReadFields();
foreach (string field in fields)
{
Console.WriteLine(field);
}
}
parser.Close();
This should result in the following output:
2 1016 7/31/2008 14:22 Geoff Dalgas 6/5/2011 22:21 http://stackoverflow.com Corvallis, OR 7679 351 81 b437f461b3fd27387c5d8ab47a293d35 34
See Microsoft.VisualBasic.FileIO.TextFieldParser for more information.
You need to add a reference to Microsoft.VisualBasic
in the Add References .NET tab.
With Cinchoo ETL - an open source library, it can automatically handles columns values containing separators.
string csv = @"2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,""Corvallis, OR"",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
using (var p = ChoCSVReader.LoadText(csv)
)
{
Console.WriteLine(p.Dump());
}
Output:
Key: Column1 [Type: String]
Value: 2
Key: Column2 [Type: String]
Value: 1016
Key: Column3 [Type: String]
Value: 7/31/2008 14:22
Key: Column4 [Type: String]
Value: Geoff Dalgas
Key: Column5 [Type: String]
Value: 6/5/2011 22:21
Key: Column6 [Type: String]
Value: http://stackoverflow.com
Key: Column7 [Type: String]
Value: Corvallis, OR
Key: Column8 [Type: String]
Value: 7679
Key: Column9 [Type: String]
Value: 351
Key: Column10 [Type: String]
Value: 81
Key: Column11 [Type: String]
Value: b437f461b3fd27387c5d8ab47a293d35
Key: Column12 [Type: String]
Value: 34
For more information, please visit codeproject article.
Hope it helps.
You could split on all commas that do have an even number of quotes following them.
You would also like to view at the specf for CSV format about handling comma's.
Useful Link : C# Regex Split - commas outside quotes
Use a library like LumenWorks to do your CSV reading. It'll handle fields with quotes in them and will likely overall be more robust than your custom solution by virtue of having been around for a long time.