In a malformed .csv file, there is a row of data with extra double quotes, e.g. the last line:
Name,Comment
\"Peter\",\"Nice singer\"
\"Paul\",\"Love \"folk\" so
If you're not on Ruby 1.9, or just get tired of regexes sometimes, split the string on ,
, strip the first/last quotes, replace remaining "
s with _
s, re-quote, and join with ,
.
(We don't always have to worry about efficiency!)
$str = '"folk"';
$new = str_replace('"', '', $str);
/* now $new is only folk, without " */
In Ruby 1.9, the following works:
result = subject.gsub(/(?<!^|,)"(?!,|$)/, '_')
Previous versions don't have lookbehind assertions.
Explanation:
(?<!^|,) # Assert that we're not at the start of the line or right after a comma
" # Match a quote
(?!,|$) # Assert that we're not at the end of the line or right before a comma
Of course this assumes that we won't run into pathological cases like
"Mary",""Oh," she said"
Unless you have no other choice, get the file regenerated with correct escaping. Any other approach is asking for trouble, because the insertion of unescaped quotes is lossy, and thus cannot be reliably reversed.
If you can't get the file fixed from the source, then Tim Pietzcker's regex is better than nothing, but I strongly recommend that you have your script print all "fixed" lines and check them for errors manually.
Meta-strategy:
It's likely the case that the data was manually entered inconsistently, CSV's get messy when people manually enter either field terminators (double quote) or separators (comma) into the field itself. If you can have the file regenerated, ask them to use an extremely unlikely field begin/end marker, like 5 tilde's (~~~~~), and then you can split on "~~~~~,~~~~~" and get the correct number of fields every time.