I am using PHP to import a CSV file, which originates from an excel spreadsheet. Some of the fields contain line breaks in them, so when I reopen the csv in excel / open office
It's an old thread but i encountered this problem and i solved it with a regex so you can avoid a library just for that. Here the code is in PHP but it can be adapted to other language.
$parsedCSV = preg_replace('/(,|\n|^)"(?:([^\n"]*)\n([^\n"]*))*"/', '$1"$2 $3"', $parsedCSV);
This solutions supposes the fields containing a linebreak are enclosed by double quotes, which seems to be a valid assumption, at least for what i have seen so far. Also, the double quotes should follow a ,
or be placed at the start of a new line (or first line).
Example:
field1,"field2-part1\nfield2-part2",field3
Here the \n is replaced by a whitespace so the result would be:
field1,"field2-part1 field2-part2",field3
The regex should handle multiple linebreaks as well.
This might not be efficient if the content is too large, but it can help for many cases and the idea can be reused, maybe optimized by doing this for smaller chunks (but you'd need to handle the cuts with fix-sized buffered).