Parsing CSV File enclosed with quotes in C#

后端 未结 10 1970
一整个雨季
一整个雨季 2021-01-21 05:50

I\'ve seen lots of samples in parsing CSV File. but this one is kind of annoying file...

so how do you parse this kind of CSV

\"1\",1/2/2010,\"The sample (\"adas

相关标签:
10条回答
  • 2021-01-21 06:04

    There is one another open source library, Cinchoo ETL, handle quoted string fine. Here is sample code.

    string csv = @"""1"",1/2/2010,""The sample(""adasdad"") asdada"",""I was pooping in the door ""Stinky"", so I'll be damn"",""AK""";
    
    using (var r = ChoCSVReader.LoadText(csv)
        .QuoteAllFields()
        )
    {
        foreach (var rec in r)
            Console.WriteLine(rec.Dump());
    }
    

    Output:

    [Count: 5]
    Key: Column1 [Type: Int64]
    Value: 1
    Key: Column2 [Type: DateTime]
    Value: 1/2/2010 12:00:00 AM
    Key: Column3 [Type: String]
    Value: The sample(adasdad) asdada
    Key: Column4 [Type: String]
    Value: I was pooping in the door Stinky, so I'll be damn
    Key: Column5 [Type: String]
    Value: AK
    
    0 讨论(0)
  • 2021-01-21 06:05

    You could split the string by ",". It is recomended that the csv file could each cell value should be enclosed in quotes like "1","2","3".....

    0 讨论(0)
  • 2021-01-21 06:09

    As no (decent) .csv parser can parse non-csv-data correctly, the task isn't to parse the data, but to fix the file(s) (and then to parse the correct data).

    To fix the data you need a list of bad rows (to be sent to the person responsible for the garbage for manual editing). To get such a list, you can

    1. use Access with a correct import specification to import the file. You'll get a list of import failures.

    2. write a script/program that opens the file via the OLEDB text driver.

    Sample file:

    "Id","Remark","DateDue"
    1,"This is good",20110413
    2,"This is ""good""",20110414
    3,"This is ""good"","bad",and "ugly",,20110415
    4,"This is ""good""" again,20110415
    

    Sample SQL/Result:

     SELECT * FROM [badcsv01.csv]
     Id Remark               DateDue   
      1 This is good         4/13/2011 
      2 This is "good"       4/14/2011 
      3 This is "good",        NULL    
      4 This is "good" again 4/15/2011 
    
    SELECT * FROM [badcsv01.csv] WHERE DateDue Is Null
     Id Remark          DateDue 
      3 This is "good",  NULL   
    
    0 讨论(0)
  • 2021-01-21 06:14

    Split based on

    ",

    I would use MyString.IndexOf("\","

    And then substring the parts. Other then that im sure someone written a csv parser out there that can handle this :)

    0 讨论(0)
  • 2021-01-21 06:15

    I very strongly recommend using TextFieldParser. Hand-coded parsers that use String.Split or regular expressions almost invariably mishandle things like quoted fields that have embedded quotes or embedded separators.

    I would be surprised, though, if it handled your particular example. As others have said, that line is, at best, ambiguous.

    0 讨论(0)
  • 2021-01-21 06:20

    First you will do it for the columns names:

                DataTable pbResults = new DataTable();
                OracleDataAdapter oda = new OracleDataAdapter(cmd);
                oda.Fill(pbResults);
    
                StringBuilder sb1 = new StringBuilder();
                StringBuilder sb2 = new StringBuilder();
                IEnumerable<string> columnNames = pbResults.Columns.Cast<DataColumn>().Select(column => column.ColumnName);
    
                sb1.Append(string.Join("\"" + "," + "\"", columnNames));                
                sb2.Append("\"");
                sb2.Append(sb1);
                sb2.AppendLine("\"");
    

    Second you will do it for each row:

                foreach (DataRow row in pbResults.Rows)
                {
                    IEnumerable<string> fields = row.ItemArray.Select(field => field.ToString());
                    sb2.Append("\"");
                    sb2.Append(string.Join("\"" + "," + "\"", fields));
                    sb2.AppendLine("\"");
                }
    
    0 讨论(0)
提交回复
热议问题