Quotes in tab-delimited file

情到浓时终转凉″ 提交于 2019-12-02 09:46:25

Check the comment on the codeproject article about quotes:

http://www.codeproject.com/Messages/3382857/Re-Quotes-inside-of-the-Field.aspx

You need to specify in the constructor that you want another character besides " to be used as quotes.

Use the FileHelpers library instead. It is widely used and will cope with quoted fields, or fields that contain quotes.

I recently solved a similar issue, and although CsvReader was working properly on all but a few lines of my TSV file, what solved my problem in the end was setting a customDelimiter in the constructor of CsvReader

public static void ParseTSV(string filepath)
    {
        using (CsvReader csvReader = new CsvReader(new StreamReader(filepath), true, '\t')) {
        //if that didn't work, passing unlikely characters into the other params might help
        //using (CsvReader csvReader = new CsvReader(new StreamReader(filepath), true, '\t', '~', '`', '~', ValueTrimmingOptions.None)) {
            int fieldcount = csvReader.FieldCount;

            //Does not work, since it's read only property
            //csvReader.Delimiter = "\t";

            string[] headers = csvReader.GetFieldHeaders();

            while (csvReader.ReadNextRecord()) {
                for (int i = 0; i < fieldcount; i++) {
                    string msg = String.Format("{0}\r{1};", headers[i],
                                               csvReader[i]);
                    Console.Write(msg);
                }
                Console.WriteLine();
            }
        }
    }

Maybe you can open the file with your application and replace each quote with another character and then process it.

I did some searching, and there is an RFC for CSV files (RFC 4180), and that does explicitly prohibit what they are doing:

Each field may or may not be enclosed in double quotes (however some programs, such as Microsoft Excel, do not use double quotes at all). If fields are not enclosed with double quotes, then double quotes may not appear inside the fields.

Basicly, if they want to do that, they need to enclose that whole field in quotes, like so:

,""SUMISEI MARU NO 2" - sea of Japan",

So if you want you can throw this problem back at them and insist they send you a "proper" RFC 4180 CSV file.

Since you have access to the source files for that CSV reader, another option would be to modify it to handle the kind of quoted strings they are feeding you.

This kind of situation is exactly why it is vital to have source code access to your toolset.

If instead you'd like to preprocess (hack) their files before feeing them to your tool, the correct method would be to look for fields with a quote not immediately in front of or behind a separator, and enclose its whole field in another set of quotes.

Right - after a late night of redbull and scratching my head, i eventually found the problem, it was commas in the "Claim_Description" field. Didn't even think about that because I was using a tab-delimited file, but as soon as i did a find and replace on all commas in the file it worked absolutely fine!

The next step is to find out how to replace those commas before processing.

Again, thanks for all the suggestions.

Cheers, Sean

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!