Dealing with commas in a CSV file

后端 未结 27 2814
傲寒
傲寒 2020-11-21 06:53

I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.

相关标签:
27条回答
  • 2020-11-21 07:24

    You can use alternative "delimiters" like ";" or "|" but simplest might just be quoting which is supported by most (decent) CSV libraries and most decent spreadsheets.

    For more on CSV delimiters and a spec for a standard format for describing delimiters and quoting see this webpage

    0 讨论(0)
  • 2020-11-21 07:25

    If you feel like reinventing the wheel, the following may work for you:

    public static IEnumerable<string> SplitCSV(string line)
    {
        var s = new StringBuilder();
        bool escaped = false, inQuotes = false;
        foreach (char c in line)
        {
            if (c == ',' && !inQuotes)
            {
                yield return s.ToString();
                s.Clear();
            }
            else if (c == '\\' && !escaped)
            {
                escaped = true;
            }
            else if (c == '"' && !escaped)
            {
                inQuotes = !inQuotes;
            }
            else
            {
                escaped = false;
                s.Append(c);
            }
        }
        yield return s.ToString();
    }
    
    0 讨论(0)
  • 2020-11-21 07:26

    You can put double quotes around the fields. I don't like this approach, as it adds another special character (the double quote). Just define an escape character (usually backslash) and use it wherever you need to escape something:

    data,more data,more data\, even,yet more

    You don't have to try to match quotes, and you have fewer exceptions to parse. This simplifies your code, too.

    0 讨论(0)
  • 2020-11-21 07:28

    The CSV format uses commas to separate values, values which contain carriage returns, linefeeds, commas, or double quotes are surrounded by double-quotes. Values that contain double quotes are quoted and each literal quote is escaped by an immediately preceding quote: For example, the 3 values:

    test
    list, of, items
    "go" he said
    

    would be encoded as:

    test
    "list, of, items"
    """go"" he said"
    

    Any field can be quoted but only fields that contain commas, CR/NL, or quotes must be quoted.

    There is no real standard for the CSV format, but almost all applications follow the conventions documented here. The RFC that was mentioned elsewhere is not a standard for CSV, it is an RFC for using CSV within MIME and contains some unconventional and unnecessary limitations that make it useless outside of MIME.

    A gotcha that many CSV modules I have seen don't accommodate is the fact that multiple lines can be encoded in a single field which means you can't assume that each line is a separate record, you either need to not allow newlines in your data or be prepared to handle this.

    0 讨论(0)
  • 2020-11-21 07:28

    In Europe we have this problem must earlier than this question. In Europe we use all a comma for a decimal point. See this numbers below:

    | American      | Europe        |
    | ------------- | ------------- |
    | 0.5           | 0,5           |
    | 3.14159265359 | 3,14159265359 |
    | 17.54         | 17,54         |
    | 175,186.15    | 175.186,15    |
    

    So it isn't possible to use the comma separator for CSV files. Because of that reason, the CSV files in Europe are separated by a semicolon (;).

    Programs like Microsoft Excel can read files with a semicolon and it's possible to switch from separator. You could even use a tab (\t) as separator. See this answer from Supper User.

    0 讨论(0)
  • 2020-11-21 07:30

    Put double quotes around strings. That is generally what Excel does.

    Ala Eli,

    you escape a double quote as two double quotes. E.g. "test1","foo""bar","test2"

    0 讨论(0)
提交回复
热议问题