Parsing CSV files in C#, with header

后端 未结 17 1623
悲哀的现实
悲哀的现实 2020-11-21 06:57

Is there a default/official/recommended way to parse CSV files in C#? I don\'t want to roll my own parser.

Also, I\'ve seen instances of people using ODBC/OLE DB to

相关标签:
17条回答
  • 2020-11-21 07:22

    Single source file solution for straightforward parsing needs, useful. Deals with all the nasty edge cases. Such as new line normalization and handling new lines in quoted string literals. Your welcome!

    If you CSV file has a header you just read out the column names (and compute column indexes) from the first row. Simple as that.

    Note that Dump is a LINQPad method, you might want to remove that if you are not using LINQPad.

    void Main()
    {
        var file1 = "a,b,c\r\nx,y,z";
        CSV.ParseText(file1).Dump();
    
        var file2 = "a,\"b\",c\r\nx,\"y,z\"";
        CSV.ParseText(file2).Dump();
    
        var file3 = "a,\"b\",c\r\nx,\"y\r\nz\"";
        CSV.ParseText(file3).Dump();
    
        var file4 = "\"\"\"\"";
        CSV.ParseText(file4).Dump();
    }
    
    static class CSV
    {
        public struct Record
        {
            public readonly string[] Row;
    
            public string this[int index] => Row[index];
    
            public Record(string[] row)
            {
                Row = row;
            }
        }
    
        public static List<Record> ParseText(string text)
        {
            return Parse(new StringReader(text));
        }
    
        public static List<Record> ParseFile(string fn)
        {
            using (var reader = File.OpenText(fn))
            {
                return Parse(reader);
            }
        }
    
        public static List<Record> Parse(TextReader reader)
        {
            var data = new List<Record>();
    
            var col = new StringBuilder();
            var row = new List<string>();
            for (; ; )
            {
                var ln = reader.ReadLine();
                if (ln == null) break;
                if (Tokenize(ln, col, row))
                {
                    data.Add(new Record(row.ToArray()));
                    row.Clear();
                }
            }
    
            return data;
        }
    
        public static bool Tokenize(string s, StringBuilder col, List<string> row)
        {
            int i = 0;
    
            if (col.Length > 0)
            {
                col.AppendLine(); // continuation
    
                if (!TokenizeQuote(s, ref i, col, row))
                {
                    return false;
                }
            }
    
            while (i < s.Length)
            {
                var ch = s[i];
                if (ch == ',')
                {
                    row.Add(col.ToString().Trim());
                    col.Length = 0;
                    i++;
                }
                else if (ch == '"')
                {
                    i++;
                    if (!TokenizeQuote(s, ref i, col, row))
                    {
                        return false;
                    }
                }
                else
                {
                    col.Append(ch);
                    i++;
                }
            }
    
            if (col.Length > 0)
            {
                row.Add(col.ToString().Trim());
                col.Length = 0;
            }
    
            return true;
        }
    
        public static bool TokenizeQuote(string s, ref int i, StringBuilder col, List<string> row)
        {
            while (i < s.Length)
            {
                var ch = s[i];
                if (ch == '"')
                {
                    // escape sequence
                    if (i + 1 < s.Length && s[i + 1] == '"')
                    {
                        col.Append('"');
                        i++;
                        i++;
                        continue;
                    }
                    i++;
                    return true;
                }
                else
                {
                    col.Append(ch);
                    i++;
                }
            }
            return false;
        }
    }
    
    0 讨论(0)
  • 2020-11-21 07:23

    Another one to this list, Cinchoo ETL - an open source library to read and write multiple file formats (CSV, flat file, Xml, JSON etc)

    Sample below shows how to read CSV file quickly (No POCO object required)

    string csv = @"Id, Name
    1, Carl
    2, Tom
    3, Mark";
    
    using (var p = ChoCSVReader.LoadText(csv)
        .WithFirstLineHeader()
        )
    {
        foreach (var rec in p)
        {
            Console.WriteLine($"Id: {rec.Id}");
            Console.WriteLine($"Name: {rec.Name}");
        }
    }
    

    Sample below shows how to read CSV file using POCO object

    public partial class EmployeeRec
    {
        public int Id { get; set; }
        public string Name { get; set; }
    }
    
    static void CSVTest()
    {
        string csv = @"Id, Name
    1, Carl
    2, Tom
    3, Mark";
    
        using (var p = ChoCSVReader<EmployeeRec>.LoadText(csv)
            .WithFirstLineHeader()
            )
        {
            foreach (var rec in p)
            {
                Console.WriteLine($"Id: {rec.Id}");
                Console.WriteLine($"Name: {rec.Name}");
            }
        }
    }
    

    Please check out articles at CodeProject on how to use it.

    0 讨论(0)
  • 2020-11-21 07:23

    This code reads csv to DataTable:

    public static DataTable ReadCsv(string path)
    {
        DataTable result = new DataTable("SomeData");
        using (TextFieldParser parser = new TextFieldParser(path))
        {
            parser.TextFieldType = FieldType.Delimited;
            parser.SetDelimiters(",");
            bool isFirstRow = true;
            //IList<string> headers = new List<string>();
    
            while (!parser.EndOfData)
            {
                string[] fields = parser.ReadFields();
                if (isFirstRow)
                {
                    foreach (string field in fields)
                    {
                        result.Columns.Add(new DataColumn(field, typeof(string)));
                    }
                    isFirstRow = false;
                }
                else
                {
                    int i = 0;
                    DataRow row = result.NewRow();
                    foreach (string field in fields)
                    {
                        row[i++] = field;
                    }
                    result.Rows.Add(row);
                }
            }
        }
        return result;
    }
    
    0 讨论(0)
  • 2020-11-21 07:25

    A CSV parser is now a part of .NET Framework.

    Add a reference to Microsoft.VisualBasic.dll (works fine in C#, don't mind the name)

    using (TextFieldParser parser = new TextFieldParser(@"c:\temp\test.csv"))
    {
        parser.TextFieldType = FieldType.Delimited;
        parser.SetDelimiters(",");
        while (!parser.EndOfData)
        {
            //Process row
            string[] fields = parser.ReadFields();
            foreach (string field in fields)
            {
                //TODO: Process field
            }
        }
    }
    

    The docs are here - TextFieldParser Class

    P.S. If you need a CSV exporter, try CsvExport (discl: I'm one of the contributors)

    0 讨论(0)
  • 2020-11-21 07:25

    If anyone wants a snippet they can plop into their code without having to bind a library or download a package. Here is a version I wrote:

        public static string FormatCSV(List<string> parts)
        {
            string result = "";
    
            foreach (string s in parts)
            {
                if (result.Length > 0)
                {
                    result += ",";
    
                    if (s.Length == 0)
                        continue;
                }
    
                if (s.Length > 0)
                {
                    result += "\"" + s.Replace("\"", "\"\"") + "\"";
                }
                else
                {
                    // cannot output double quotes since its considered an escape for a quote
                    result += ",";
                }
            }
    
            return result;
        }
    
        enum CSVMode
        {
            CLOSED = 0,
            OPENED_RAW = 1,
            OPENED_QUOTE = 2
        }
    
        public static List<string> ParseCSV(string input)
        {
            List<string> results;
    
            CSVMode mode;
    
            char[] letters;
    
            string content;
    
    
            mode = CSVMode.CLOSED;
    
            content = "";
            results = new List<string>();
            letters = input.ToCharArray();
    
            for (int i = 0; i < letters.Length; i++)
            {
                char letter = letters[i];
                char nextLetter = '\0';
    
                if (i < letters.Length - 1)
                    nextLetter = letters[i + 1];
    
                // If its a quote character
                if (letter == '"')
                {
                    // If that next letter is a quote
                    if (nextLetter == '"' && mode == CSVMode.OPENED_QUOTE)
                    {
                        // Then this quote is escaped and should be added to the content
    
                        content += letter;
    
                        // Skip the escape character
                        i++;
                        continue;
                    }
                    else
                    {
                        // otherwise its not an escaped quote and is an opening or closing one
                        // Character is skipped
    
                        // If it was open, then close it
                        if (mode == CSVMode.OPENED_QUOTE)
                        {
                            results.Add(content);
    
                            // reset the content
                            content = "";
    
                            mode = CSVMode.CLOSED;
    
                            // If there is a next letter available
                            if (nextLetter != '\0')
                            {
                                // If it is a comma
                                if (nextLetter == ',')
                                {
                                    i++;
                                    continue;
                                }
                                else
                                {
                                    throw new Exception("Expected comma. Found: " + nextLetter);
                                }
                            }
                        }
                        else if (mode == CSVMode.OPENED_RAW)
                        {
                            // If it was opened raw, then just add the quote 
                            content += letter;
                        }
                        else if (mode == CSVMode.CLOSED)
                        {
                            // Otherwise open it as a quote 
    
                            mode = CSVMode.OPENED_QUOTE;
                        }
                    }
                }
                // If its a comma seperator
                else if (letter == ',')
                {
                    // If in quote mode
                    if (mode == CSVMode.OPENED_QUOTE)
                    {
                        // Just read it
                        content += letter;
                    }
                    // If raw, then close the content
                    else if (mode == CSVMode.OPENED_RAW)
                    {
                        results.Add(content);
    
                        content = "";
    
                        mode = CSVMode.CLOSED;
                    }
                    // If it was closed, then open it raw
                    else if (mode == CSVMode.CLOSED)
                    {
                        mode = CSVMode.OPENED_RAW;
    
                        results.Add(content);
    
                        content = "";
                    }
                }
                else
                {
                    // If opened quote, just read it
                    if (mode == CSVMode.OPENED_QUOTE)
                    {
                        content += letter;
                    }
                    // If opened raw, then read it
                    else if (mode == CSVMode.OPENED_RAW)
                    {
                        content += letter;
                    }
                    // It closed, then open raw
                    else if (mode == CSVMode.CLOSED)
                    {
                        mode = CSVMode.OPENED_RAW;
    
                        content += letter;
                    }
                }
            }
    
            // If it was still reading when the buffer finished
            if (mode != CSVMode.CLOSED)
            {
                results.Add(content);
            }
    
            return results;
        }
    
    0 讨论(0)
提交回复
热议问题