How do I handle line breaks in a CSV file using C#?

前端 未结 14 1773
情书的邮戳
情书的邮戳 2020-12-15 08:43

I have an Excel spreadsheet being converted into a CSV file in C#, but am having a problem dealing with line breaks. For instance:

\"John\",\"23\",\"555-555         


        
相关标签:
14条回答
  • 2020-12-15 09:16

    I've used this piece of code recently to parse rows from a CSV file (this is a simplified version):

    private void Parse(TextReader reader)
        {
            var row = new List<string>();
            var isStringBlock = false;
            var sb = new StringBuilder();
    
            long charIndex = 0;
            int currentLineCount = 0;
    
            while (reader.Peek() != -1)
            {
                charIndex++;
    
                char c = (char)reader.Read();
    
                if (c == '"')
                    isStringBlock = !isStringBlock;
    
                if (c == separator && !isStringBlock) //end of word
                {
                    row.Add(sb.ToString().Trim()); //add word
                    sb.Length = 0;
                }
                else if (c == '\n' && !isStringBlock) //end of line
                {
                    row.Add(sb.ToString().Trim()); //add last word in line
                    sb.Length = 0;
    
                    //DO SOMETHING WITH row HERE!
    
                    currentLineCount++;
    
                    row = new List<string>();
                }
                else
                {
                    if (c != '"' && c != '\r') sb.Append(c == '\n' ? ' ' : c);
                }
            }
    
            row.Add(sb.ToString().Trim()); //add last word
    
            //DO SOMETHING WITH LAST row HERE!
        }
    
    0 讨论(0)
  • 2020-12-15 09:16

    What I usually do is read the text in character by character opposed to line by line, due to this very problem.

    As you're reading each character, you should be able to figure out where each cell starts and stops, but also the difference between a linebreak in a row and in a cell: If I remember correctly, for Excel generated files anyway, rows start with \r\n, and newlines in cells are only \r.

    0 讨论(0)
  • 2020-12-15 09:17

    The LINQy solution:

    string csvText = File.ReadAllText("C:\\Test.txt");
    
    var query = csvText
        .Replace(Environment.NewLine, string.Empty)
        .Replace("\"\"", "\",\"").Split(',')
        .Select((i, n) => new { i, n }).GroupBy(a => a.n / 3);
    
    0 讨论(0)
  • 2020-12-15 09:19

    Heed the advice from the experts and Don't roll your own CSV parser.

    Your first thought is, "How do I handle new line breaks?"

    Your next thought is, "I need to handle commas inside of quotes."

    Your next thought will be, "Oh, crap, I need to handle quotes inside of quotes. Escaped quotes. Double quotes. Single quotes..."

    It's a road to madness. Don't write your own. Find a library with an extensive unit test coverage that hits all the hard parts and has gone through hell for you. For .NET, use the free FileHelpers library.

    0 讨论(0)
  • 2020-12-15 09:21

    There is a built-in method for reading CSV files in .NET (requires Microsoft.VisualBasic assembly reference added):

    public static IEnumerable<string[]> ReadSV(TextReader reader, params string[] separators)
    {
        var parser = new Microsoft.VisualBasic.FileIO.TextFieldParser(reader);
        parser.SetDelimiters(separators);
        while (!parser.EndOfData)
            yield return parser.ReadFields();
    }
    

    If you're dealing with really large files this CSV reader claims to be the fastest one you'll find: http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

    0 讨论(0)
  • 2020-12-15 09:21

    There is an example parser is c# that seems to handle your case correctly. Then you can read your data in and purge the line breaks out of it post-read. Part 2 is the parser, and there is a Part 1 that covers the writer portion.

    0 讨论(0)
提交回复
热议问题