I\'m working on a feature to export search results to a CSV file to be opened in Excel. One of the fields is a free-text field, which may contain line breaks, commas, quota
If you have access to Mac OS I have found that the Apple spreadsheet Numbers does a good job of unpicking a complex multi-line CSV file that Excel could not handle. Just open the .csv
with Numbers and then export to Excel.
My experience with Excel 2010 on WinXP with French regional settings
If the field contains a leading space, Excel ignores the double quote as a text qualifier. The solution is to eliminate leading spaces between the comma (field separator) and double-quote. For example:
Broken:
Name,Title,Description
"John", "Mr.", "My detailed description"
Working:
Name,Title,Description
"John","Mr.","My detailed description"
Almost 10 years after the original post, Excel hasn't improved in importing CSV files. However, I found that it is much better in importing HTML tables. So, one can use Python to convert CSV to HTML and then import the resulting HTML to Excel.
The advantages of this approach are: (a) it works reliably, (b) you don't need to send your data to a third party service (e.g. Google sheets), (c) no extra "fat" installations required (LibreOffice, Numbers etc.) for most users, (d) higher level than meddling with CR/LF characters and BOM markers, (e) no need to fiddle with locale settings.
The following steps can be run on any bash-like shell as long as Python 3 is installed. Although Python can be used to directly read CSV, csvkit is used to do an intermediate conversion to JSON. This allows us to avoid having to deal with CSV intricacies in our Python code.
First, save the following script as json2html.py
. The script reads a JSON file from stdin and dumps it as an HTML table:
#!/usr/bin/env python3
import sys, json, html
if __name__ == '__main__':
header_emitted = False
make_th = lambda s: "<th>%s</th>" % (html.escape(s if s else ""))
make_td = lambda s: "<td>%s</td>" % (html.escape(s if s else ""))
make_tr = lambda l, make_cell: "<tr>%s</tr>" % ( "".join([make_cell(v) for v in l]) )
print("<html><body>\n<table>")
for line in json.load(sys.stdin):
lk, lv = zip(*line.items())
if not header_emitted:
print(make_tr(lk, make_th))
header_emitted = True
print(make_tr(lv, make_td))
print("</table\n</body></html>")
Then, install csvkit in a virtual environment and use csvjson
to feed the input file to our script. It is a good idea to disable cell type guessing with the -I
argument:
$ virtualenv -p python3 pyenv
$ . ./pyenv/bin/activate
$ pip install csvkit
$ csvjson -I input.csv | python3 json2html.py > output.html
Now output.html
can be imported in Excel. Line breaks in cells will have been preserved.
Optionally, you may want to cleanup your Python virtual environment:
$ deactivate
$ rm -rf pyenv
I have finally found the problem!
It turns out that we were writing the file using Unicode encoding, rather than ASCII or UTF-8. Changing the encoding on the FileStream seems to solve the problem.
Thanks everyone for all your suggestions!
Line breaks inside double quotes are perfectly fine according to CSV standard. The parsing of line breaks in Excel depends on the OS setting of list separator:
Windows: you need to set the list seperator to comma (Region and language » Formats » Advanced) Source: https://superuser.com/questions/238944/how-to-force-excel-to-open-csv-files-with-data-arranged-in-columns#answer-633302
Mac: Need to change the region to US (then to manually change back other settings to your preference) Source: https://answers.microsoft.com/en-us/mac/forum/macoffice2016-macexcel/line-separator-comma-semicolon-in-excel-2016-for/7db1b1a0-0300-44ba-ab9b-35d1c40159c6 (see NewmanLee's answer)
Don't forget to close Excel completely before trying again.
I've succesfully replicated the issue and was able to fix it using the above in both Max and Windows.