I am looking for suggestions on how to handle a csv file that is being created, then uploaded by our customers, and that may have a comma in a value, like a company name.
There is a library available through nuget for dealing with pretty much any well formed CSV (.net) - CsvHelper
Example to map to a class:
var csv = new CsvReader( textReader );
var records = csv.GetRecords<MyClass>();
Example to read individual fields:
var csv = new CsvReader( textReader );
while( csv.Read() )
{
var intField = csv.GetField<int>( 0 );
var stringField = csv.GetField<string>( 1 );
var boolField = csv.GetField<bool>( "HeaderName" );
}
Letting the client drive the file format:
,
is the standard field delimiter, "
is the standard value used to escape fields that contain a delimiter, quote, or line ending.
To use (for example) #
for fields and '
for escaping:
var csv = new CsvReader( textReader );
csv.Configuration.Delimiter = "#";
csv.Configuration.Quote = ''';
// read the file however meets your needs
More Documentation
In case you're on a *nix-system, have access to sed and there can be one or more unwanted commas only in a specific field of your CSV, you can use the following one-liner in order to enclose them in "
as RFC4180 Section 2 proposes:
sed -r 's/([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*)/\1"\2"\3/' inputfile
Depending on which field the unwanted comma(s) may be in you have to alter/extend the capturing groups of the regex (and the substitution).
The example above will enclose the fourth field (out of six) in quotation marks.
In combination with the --in-place-option you can apply these changes directly to the file.
In order to "build" the right regex, there's a simple principle to follow:
[^,]*,
and put them all together in a capturing group.(.*)
.,.*
and put them all together in a capturing group.Here is a short overview of different possible regexes/substitutions depending on the specific field. If not given, the substitution is \1"\2"\3
.
([^,]*)(,.*) #first field, regex
"\1"\2 #first field, substitution
(.*,)([^,]*) #last field, regex
\1"\2" #last field, substitution
([^,]*,)(.*)(,.*,.*,.*) #second field (out of five fields)
([^,]*,[^,]*,)(.*)(,.*) #third field (out of four fields)
([^,]*,[^,]*,[^,]*,)(.*)(,.*,.*) #fourth field (out of six fields)
If you want to remove the unwanted comma(s) with sed
instead of enclosing them with quotation marks refer to this answer.
Here's a neat little workaround:
You can use a Greek Lower Numeral Sign instead (U+0375)
It looks like this ͵
Using this method saves you a lot of resources too...
Just use SoftCircuits.CsvParser on NuGet. It will handle all those details for you and efficiently handles very large files. And, if needed, it can even import/export objects by mapping columns to object properties. In addition, my testing showed it averages nearly 4 times faster than the popular CsvHelper.
An example might help to show how commas can be displayed in a .csv file. Create a simple text file as follows:
Save this text file as a text file with suffix ".csv" and open it with Excel 2000 from Windows 10.
aa,bb,cc,d;d "In the spreadsheet presentation, the below line should look like the above line except the below shows a displayed comma instead of a semicolon between the d's." aa,bb,cc,"d,d", This works even in Excel
aa,bb,cc,"d,d", This works even in Excel 2000 aa,bb,cc,"d ,d", This works even in Excel 2000 aa,bb,cc,"d , d", This works even in Excel 2000
aa,bb,cc, " d,d", This fails in Excel 2000 due to the space belore the 1st quote aa,bb,cc, " d ,d", This fails in Excel 2000 due to the space belore the 1st quote aa,bb,cc, " d , d", This fails in Excel 2000 due to the space belore the 1st quote
aa,bb,cc,"d,d " , This works even in Excel 2000 even with spaces before and after the 2nd quote. aa,bb,cc,"d ,d " , This works even in Excel 2000 even with spaces before and after the 2nd quote. aa,bb,cc,"d , d " , This works even in Excel 2000 even with spaces before and after the 2nd quote.
Rule: If you want to display a comma in a a cell (field) of a .csv file: "Start and end the field with a double quotes, but avoid white space before the 1st quote"
I used papaParse library to have the CSV file parsed and have the key-value pairs(key/header/first row of CSV file-value).
here is example that I use:
https://codesandbox.io/embed/llqmrp96pm
it has dummy.csv file in there to have the CSV parsing demo.
I've used it within reactJS though it is easy and simple to replicate in app written with any language.