I am getting this error:
Illegal quoting in line 1. (CSV::MalformedCSVError)
Line 1 in my file is as follows:
\"Status\" \"Intern
Binary encoding of my file is below:
"\xFF\xFES\x00t\x00a\x00t\x00u\x00s\x00...
0xFF
0xFE
is the byte order mark for UTF-16LE.
You have to specify the encoding when processing this file with CSV#foreach:
This method also understands an additional
:encoding
parameter that you can use to specify the Encoding of the data in the file to be read. You must provide this unless your data is inEncoding::default_external()
. CSV will use this to determine how to parse the data. You may provide a second Encoding to have the data transcoded as it is read. For example,encoding: "UTF-32BE:UTF-8"
would read UTF-32BE data from the file but transcode it to UTF-8 before CSV parses it.
Furthermore you have to specify that a BOM is present. According to the IO#new docs:
If “BOM|UTF-8”, “BOM|UTF-16LE” or “BOM|UTF16-BE” are (...) present, the BOM is stripped
Applied to your file and example:
CSV.foreach(file, col_sep: "\t", encoding: "BOM|UTF-16LE:UTF-8", headers: true) do |row|
# ...
end
On *nix systems the file
command is used to give a reasonable-hint to what the file contents are:
file /usr/share/dict/words
/usr/share/dict/words: ASCII text
file /usr/bin/ruby
/usr/bin/ruby: Mach-O universal binary with 2 architectures
/usr/bin/ruby (for architecture i386): Mach-O executable i386
/usr/bin/ruby (for architecture x86_64): Mach-O 64-bit executable x86_64
If you're on *nix, try running that against your CSV file and see what it says. It's not fool-proof, but it's reasonably accurate.
As something to get you started, here's how to convert space-delimited fields to tab-delimited:
row = '"Status" "Internal ID" "Language" "Created At" "Updated At" "IP Address" "Location" "Username" "GET Variables" "Referrer" "Number of Saves" "Weighted Score" "Completion Time" "Invite Code" "Invite Email" "Invite Name" "Invite: branchid" "Invite: lastname" "Invite: clientname" "Invite: membershipid" "Invite: clientid" "Invite: dateofbirth" "Invite: membershiptype" "Invite: branch" "Invite: unitid" "Invite: shortname" "Invite: changedatetime" "Invite: homephone" "Collector" '
row.gsub!(/"\s+"/, %Q["\t"]) # => "\"Status\"\t\"Internal ID\"\t\"Language\"\t\"Created At\"\t\"Updated At\"\t\"IP Address\"\t\"Location\"\t\"Username\"\t\"GET Variables\"\t\"Referrer\"\t\"Number of Saves\"\t\"Weighted Score\"\t\"Completion Time\"\t\"Invite Code\"\t\"Invite Email\"\t\"Invite Name\"\t\"Invite: branchid\"\t\"Invite: lastname\"\t\"Invite: clientname\"\t\"Invite: membershipid\"\t\"Invite: clientid\"\t\"Invite: dateofbirth\"\t\"Invite: membershiptype\"\t\"Invite: branch\"\t\"I...