How to split csv whose columns may contain ,

前端 未结 8 1161
我寻月下人不归
我寻月下人不归 2020-11-22 06:35

Given

2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,"Corvallis, OR",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34

相关标签:
8条回答
  • 2020-11-22 07:07

    I see that if you paste csv delimited text in Excel and do a "Text to Columns", it asks you for a "text qualifier". It's defaulted to a double quote so that it treats text within double quotes as literal. I imagine that Excel implements this by going one character at a time, if it encounters a "text qualifier", it keeps going to the next "qualifier". You can probably implement this yourself with a for loop and a boolean to denote if you're inside literal text.

    public string[] CsvParser(string csvText)
    {
        List<string> tokens = new List<string>();
    
        int last = -1;
        int current = 0;
        bool inText = false;
    
        while(current < csvText.Length)
        {
            switch(csvText[current])
            {
                case '"':
                    inText = !inText; break;
                case ',':
                    if (!inText) 
                    {
                        tokens.Add(csvText.Substring(last + 1, (current - last)).Trim(' ', ',')); 
                        last = current;
                    }
                    break;
                default:
                    break;
            }
            current++;
        }
    
        if (last != csvText.Length - 1) 
        {
            tokens.Add(csvText.Substring(last+1).Trim());
        }
    
        return tokens.ToArray();
    }
    
    0 讨论(0)
  • 2020-11-22 07:13

    I had a problem with a CSV that contains fields with a quote character in them, so using the TextFieldParser, I came up with the following:

    private static string[] parseCSVLine(string csvLine)
    {
      using (TextFieldParser TFP = new TextFieldParser(new MemoryStream(Encoding.UTF8.GetBytes(csvLine))))
      {
        TFP.HasFieldsEnclosedInQuotes = true;
        TFP.SetDelimiters(",");
    
        try 
        {           
          return TFP.ReadFields();
        }
        catch (MalformedLineException)
        {
          StringBuilder m_sbLine = new StringBuilder();
    
          for (int i = 0; i < TFP.ErrorLine.Length; i++)
          {
            if (i > 0 && TFP.ErrorLine[i]== '"' &&(TFP.ErrorLine[i + 1] != ',' && TFP.ErrorLine[i - 1] != ','))
              m_sbLine.Append("\"\"");
            else
              m_sbLine.Append(TFP.ErrorLine[i]);
          }
    
          return parseCSVLine(m_sbLine.ToString());
        }
      }
    }
    

    A StreamReader is still used to read the CSV line by line, as follows:

    using(StreamReader SR = new StreamReader(FileName))
    {
      while (SR.Peek() >-1)
        myStringArray = parseCSVLine(SR.ReadLine());
    }
    
    0 讨论(0)
  • 2020-11-22 07:15

    Use the Microsoft.VisualBasic.FileIO.TextFieldParser class. This will handle parsing a delimited file, TextReader or Stream where some fields are enclosed in quotes and some are not.

    For example:

    using Microsoft.VisualBasic.FileIO;
    
    string csv = "2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,\"Corvallis, OR\",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
    
    TextFieldParser parser = new TextFieldParser(new StringReader(csv));
    
    // You can also read from a file
    // TextFieldParser parser = new TextFieldParser("mycsvfile.csv");
    
    parser.HasFieldsEnclosedInQuotes = true;
    parser.SetDelimiters(",");
    
    string[] fields;
    
    while (!parser.EndOfData)
    {
        fields = parser.ReadFields();
        foreach (string field in fields)
        {
            Console.WriteLine(field);
        }
    } 
    
    parser.Close();
    

    This should result in the following output:

    2
    1016
    7/31/2008 14:22
    Geoff Dalgas
    6/5/2011 22:21
    http://stackoverflow.com
    Corvallis, OR
    7679
    351
    81
    b437f461b3fd27387c5d8ab47a293d35
    34
    

    See Microsoft.VisualBasic.FileIO.TextFieldParser for more information.

    You need to add a reference to Microsoft.VisualBasic in the Add References .NET tab.

    0 讨论(0)
  • 2020-11-22 07:20

    With Cinchoo ETL - an open source library, it can automatically handles columns values containing separators.

    string csv = @"2,1016,7/31/2008 14:22,Geoff Dalgas,6/5/2011 22:21,http://stackoverflow.com,""Corvallis, OR"",7679,351,81,b437f461b3fd27387c5d8ab47a293d35,34";
    
    using (var p = ChoCSVReader.LoadText(csv)
        )
    {
        Console.WriteLine(p.Dump());
    }
    

    Output:

    Key: Column1 [Type: String]
    Value: 2
    Key: Column2 [Type: String]
    Value: 1016
    Key: Column3 [Type: String]
    Value: 7/31/2008 14:22
    Key: Column4 [Type: String]
    Value: Geoff Dalgas
    Key: Column5 [Type: String]
    Value: 6/5/2011 22:21
    Key: Column6 [Type: String]
    Value: http://stackoverflow.com
    Key: Column7 [Type: String]
    Value: Corvallis, OR
    Key: Column8 [Type: String]
    Value: 7679
    Key: Column9 [Type: String]
    Value: 351
    Key: Column10 [Type: String]
    Value: 81
    Key: Column11 [Type: String]
    Value: b437f461b3fd27387c5d8ab47a293d35
    Key: Column12 [Type: String]
    Value: 34
    

    For more information, please visit codeproject article.

    Hope it helps.

    0 讨论(0)
  • 2020-11-22 07:22

    You could split on all commas that do have an even number of quotes following them.

    You would also like to view at the specf for CSV format about handling comma's.

    Useful Link : C# Regex Split - commas outside quotes

    0 讨论(0)
  • 2020-11-22 07:22

    Use a library like LumenWorks to do your CSV reading. It'll handle fields with quotes in them and will likely overall be more robust than your custom solution by virtue of having been around for a long time.

    0 讨论(0)
提交回复
热议问题