I\'ve seen lots of samples in parsing CSV File. but this one is kind of annoying file...
so how do you parse this kind of CSV
\"1\",1/2/2010,\"The sample (\"adas
There is one another open source library, Cinchoo ETL, handle quoted string fine. Here is sample code.
string csv = @"""1"",1/2/2010,""The sample(""adasdad"") asdada"",""I was pooping in the door ""Stinky"", so I'll be damn"",""AK""";
using (var r = ChoCSVReader.LoadText(csv)
.QuoteAllFields()
)
{
foreach (var rec in r)
Console.WriteLine(rec.Dump());
}
Output:
[Count: 5]
Key: Column1 [Type: Int64]
Value: 1
Key: Column2 [Type: DateTime]
Value: 1/2/2010 12:00:00 AM
Key: Column3 [Type: String]
Value: The sample(adasdad) asdada
Key: Column4 [Type: String]
Value: I was pooping in the door Stinky, so I'll be damn
Key: Column5 [Type: String]
Value: AK
You could split the string by ","
. It is recomended that the csv file could each cell value should be enclosed in quotes like "1","2","3".....
As no (decent) .csv parser can parse non-csv-data correctly, the task isn't to parse the data, but to fix the file(s) (and then to parse the correct data).
To fix the data you need a list of bad rows (to be sent to the person responsible for the garbage for manual editing). To get such a list, you can
use Access with a correct import specification to import the file. You'll get a list of import failures.
write a script/program that opens the file via the OLEDB text driver.
Sample file:
"Id","Remark","DateDue"
1,"This is good",20110413
2,"This is ""good""",20110414
3,"This is ""good"","bad",and "ugly",,20110415
4,"This is ""good""" again,20110415
Sample SQL/Result:
SELECT * FROM [badcsv01.csv]
Id Remark DateDue
1 This is good 4/13/2011
2 This is "good" 4/14/2011
3 This is "good", NULL
4 This is "good" again 4/15/2011
SELECT * FROM [badcsv01.csv] WHERE DateDue Is Null
Id Remark DateDue
3 This is "good", NULL
Split based on
",
I would use MyString.IndexOf("\","
And then substring the parts. Other then that im sure someone written a csv parser out there that can handle this :)
I very strongly recommend using TextFieldParser. Hand-coded parsers that use String.Split
or regular expressions almost invariably mishandle things like quoted fields that have embedded quotes or embedded separators.
I would be surprised, though, if it handled your particular example. As others have said, that line is, at best, ambiguous.
First you will do it for the columns names:
DataTable pbResults = new DataTable();
OracleDataAdapter oda = new OracleDataAdapter(cmd);
oda.Fill(pbResults);
StringBuilder sb1 = new StringBuilder();
StringBuilder sb2 = new StringBuilder();
IEnumerable<string> columnNames = pbResults.Columns.Cast<DataColumn>().Select(column => column.ColumnName);
sb1.Append(string.Join("\"" + "," + "\"", columnNames));
sb2.Append("\"");
sb2.Append(sb1);
sb2.AppendLine("\"");
Second you will do it for each row:
foreach (DataRow row in pbResults.Rows)
{
IEnumerable<string> fields = row.ItemArray.Select(field => field.ToString());
sb2.Append("\"");
sb2.Append(string.Join("\"" + "," + "\"", fields));
sb2.AppendLine("\"");
}