Quotes in tab-delimited file

后端 未结 7 1985
我寻月下人不归
我寻月下人不归 2021-01-27 16:58

I\'ve got a simple application that opens a tab-delimited text file, and inserts that data into a database.

I\'m using this CSV reader to read the data: http://www.codep

相关标签:
7条回答
  • 2021-01-27 17:28

    Use the FileHelpers library instead. It is widely used and will cope with quoted fields, or fields that contain quotes.

    0 讨论(0)
  • 2021-01-27 17:33

    I did some searching, and there is an RFC for CSV files (RFC 4180), and that does explicitly prohibit what they are doing:

    Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

    Basicly, if they want to do that, they need to enclose that whole field in quotes, like so:

    ,""SUMISEI MARU NO 2" - sea of Japan",
    

    So if you want you can throw this problem back at them and insist they send you a "proper" RFC 4180 CSV file.

    Since you have access to the source files for that CSV reader, another option would be to modify it to handle the kind of quoted strings they are feeding you.

    This kind of situation is exactly why it is vital to have source code access to your toolset.

    If instead you'd like to preprocess (hack) their files before feeing them to your tool, the correct method would be to look for fields with a quote not immediately in front of or behind a separator, and enclose its whole field in another set of quotes.

    0 讨论(0)
  • 2021-01-27 17:35

    Check the comment on the codeproject article about quotes:

    http://www.codeproject.com/Messages/3382857/Re-Quotes-inside-of-the-Field.aspx

    You need to specify in the constructor that you want another character besides " to be used as quotes.

    0 讨论(0)
  • 2021-01-27 17:38

    Maybe you can open the file with your application and replace each quote with another character and then process it.

    0 讨论(0)
  • 2021-01-27 17:43

    I recently solved a similar issue, and although CsvReader was working properly on all but a few lines of my TSV file, what solved my problem in the end was setting a customDelimiter in the constructor of CsvReader

    public static void ParseTSV(string filepath)
        {
            using (CsvReader csvReader = new CsvReader(new StreamReader(filepath), true, '\t')) {
            //if that didn't work, passing unlikely characters into the other params might help
            //using (CsvReader csvReader = new CsvReader(new StreamReader(filepath), true, '\t', '~', '`', '~', ValueTrimmingOptions.None)) {
                int fieldcount = csvReader.FieldCount;
    
                //Does not work, since it's read only property
                //csvReader.Delimiter = "\t";
    
                string[] headers = csvReader.GetFieldHeaders();
    
                while (csvReader.ReadNextRecord()) {
                    for (int i = 0; i < fieldcount; i++) {
                        string msg = String.Format("{0}\r{1};", headers[i],
                                                   csvReader[i]);
                        Console.Write(msg);
                    }
                    Console.WriteLine();
                }
            }
        }
    
    0 讨论(0)
  • 2021-01-27 17:43

    Right - after a late night of redbull and scratching my head, i eventually found the problem, it was commas in the "Claim_Description" field. Didn't even think about that because I was using a tab-delimited file, but as soon as i did a find and replace on all commas in the file it worked absolutely fine!

    The next step is to find out how to replace those commas before processing.

    Again, thanks for all the suggestions.

    Cheers, Sean

    0 讨论(0)
提交回复
热议问题