I have a CSV file but the delimiter is a semi colon ;
and each column is enclosed with double quotes. There are also occurrences of ;
in some values such as & amp;
I am using TextFieldParser to parse the file. This is the sample data:
"A001";"RT:This is a tweet"; "http://www.whatever.com/test/module & amp;one"
For the above example , I am getting more columns/fields than what I should get.
Field[0] = "A001"
Field[1] = "RT:This is a tweet"
Field[2] = "http://www.whatever.com/test/module&"
Field[3] = "one"
This is my code. What changes need to be done to handle such scenario?
using (var parser = new TextFieldParser(fileName))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(";");
parser.TrimWhiteSpace = true;
parser.HasFieldsEnclosedInQuotes = false;
int rowIndex = 0;
PropertyInfo[] properties = typeof(TwitterData).GetProperties();
while (parser.PeekChars(1) != null)
{
var cleanFieldRowCells = parser.ReadFields().Select(
f => f.Trim(new[] { ' ', '"' }));
var twitter = new TwitterData();
int index = 0;
foreach (string c in cleanFieldRowCells)
{
string str = c;
if (properties[index].PropertyType == typeof(DateTime))
{
string twitterDateTemplate = "ddd MMM dd HH:mm:ss +ffff yyyy";
DateTime createdAt = DateTime.ParseExact(str, twitterDateTemplate, new System.Globalization.CultureInfo("en-AU"));
properties[index].SetValue(twitter, createdAt);
}
else
{
properties[index].SetValue(twitter, str);
}
index++;
}
}
-Alan-
Using the two sample strings you have above and setting the HasFieldsEnclosedInQuotes
property to true works for me.
string LINES = @"
""A001"";""RT:This is a tweet""; ""http://www.whatever.com/test/module&one""
""A001"";""RT: Test1 ; Test2"";""test.com"";
";
using (var sr = new StringReader(LINES))
{
using (var parser = new TextFieldParser(sr))
{
parser.TextFieldType = FieldType.Delimited;
parser.SetDelimiters(";");
parser.TrimWhiteSpace = true;
parser.HasFieldsEnclosedInQuotes = true;
while (parser.PeekChars(1) != null)
{
var cleanFieldRowCells = parser.ReadFields().Select(
f => f.Trim(new[] { ' ', '"' })).ToArray();
Console.WriteLine("New Line");
for (int i = 0; i < cleanFieldRowCells.Length; ++i)
{
Console.WriteLine(
"Field[{0}] = [{1}]", i, cleanFieldRowCells[i]
);
}
Console.WriteLine("{0}", new string('=', 40));
}
}
}
OUTPUT:
New Line
Field[0] = [A001]
Field[1] = [RT:This is a tweet]
Field[2] = [http://www.whatever.com/test/module&one]
========================================
New Line
Field[0] = [A001]
Field[1] = [RT: Test1 ; Test2]
Field[2] = [test.com]
Field[3] = []
========================================
来源:https://stackoverflow.com/questions/35389302/parsing-semi-colon-delimeter-file