I have found a CSV parsing issue with FasterCSV (1.5.0) which seems like a genuine bug, but which I'm hoping there's a workaround for.
Basically, adding a space after the separator (in my case a comma) when the fields are enclosed in quotes generates a MalformedCSVError
.
Here's a simple example:
# No quotes on fields -- works fine
FasterCSV.parse_line("one,two,three")
=> ["one", "two", "three"]
# Quotes around fields with no spaces after separators -- works fine
FasterCSV.parse_line("\"one\",\"two\",\"three\"")
=> ["one", "two", "three"]
# Quotes around fields but with a space after the first separator -- fails!
FasterCSV.parse_line("\"one\", \"two\",\"three\"")
=> FasterCSV::MalformedCSVError: Illegal quoting on line 1.
Am I going mad, or is this a bug in FasterCSV?
The MalformedCSVError
is correct here.
Leading/trailing spaces in CSV format are not ignored, they are considered part of a field. So this means you have started a field with a space, and then included unescaped double quotes in that field, which would cause the illegal quoting error.
Maybe this library is just more strict than others you have used.
Maybe you could set the :col_sep: option to ', ' to make it parse files like that.
I had hoped that the :col_sep
option might allow a regular expression, but it seems to be used for both reading and writing, which is a shame. The documentation doesn't hold out much hope and your need is probably more immediate than could be satisfied by requesting a change or submitting a patch ;-)
If you're calling #parse_line
explicitly, then you could always call
gsub(/,\s*/, ',')
on your input line. That regular expression might need to change significantly if you anticipate the possibility of comma-space within quoted strings. (I'd suggest reposting such a question here with a suitable tag and let the RegEx mavens loose on it should that be the case).
来源:https://stackoverflow.com/questions/1807942/overcoming-a-basic-problem-with-csv-parsing-using-the-fastercsv-gem