Reading large csv files with strings containing commas as one field

半城伤御伤魂 提交于 2019-12-12 12:33:18

问题


I have a large .csv file (~26000 rows). I want to be able to read it into matlab. Another problem is that it contains a collection of strings delimited by commas in one of the fields.

I'm having trouble reading it. I tried stuff like tdfread, which won't work here. Any tricks with textscan i should be aware about?

Is there any other way?


回答1:


I'm not sure what is generating your CSV file but that is your problem.

The point of a CSV file, is that the file itself designates separation of fields. If the text of the CSV contains commas, then nothing you can do will help you. How would ANY program know when the text in a single field contains commas, or when that comma is a field delimiter?

Proper CSV would have a text qualifier. Some generators/readers gives you the option to use one. The standard text qualifier is a " (quote). Its changeable, though, because your text may contain those, too.

Again, its all about generating proper CSV content.




回答2:


There's a chance that xlsread won't give you the answer you expect -- do the strings always appear in the same columns, for example? I think (as everyone else seems to :-) that it would be more robust to just use

fid = fopen('yourfile.csv');

and then either textscan

t = textscan(fid, '%s', delimiter', sprintf('\n'));
t = t{1};

or just fgetl (the example in the help is perfect).

After that you can do some line-by-line processing -- using textscan again on the text content of each line, for example, is a nice, quick way to get a cell-array that will allow fast analysis of each line.




回答3:


You have a problem because you're reading it in as a .csv, and you have commas within your data. You can get it in Excel and manipulate the date, possibly extract the unwanted commas with Excel formulas. I work with .csv files for DB imports quite a bit. I imagine matLab has similar rules, which is - no commas in your data.

Can you tell us more about your data? Are there commas throughout, our just one column? Maybe you can read it in as tab delimited?




回答4:


Are you using a Unix system? The reason I am asking is that you could use a command-line function such as sed and regular expressions to clean those data files before you pass them into Matlab. Here is a link that explains how to do exactly what you are looking for.




回答5:


Since, as others have observed, your file is CSV with commas inside what you think of as a single field, it's going to be hard to persuade Matlab that that really is only one field. I think your best strategy is going to be to read one line at a time, into a string acting as a buffer, and to translate it, field-by-field, into the variables or other data structures that you want. Since Matlab has in-built regular expression capabilities this shouldn't be too hard.

And, as others have already suggested, posting a sample of your data would help us to help you.




回答6:


One easy solution is:

path='C:\folder1\folder2\';
data = 'data.csv';
data = dataset('xlsfile',sprintf('%s\%s', path,data));

Of course you could also do the following:

[data,path] = uigetfile('C:\folder1\folder2\*.csv');
data = dataset('xlsfile',sprintf('%s\%s', path,data));

now you will have loaded the data as dataset. An easy way to get a column 1 for example is

double(data(1))



来源:https://stackoverflow.com/questions/2170184/reading-large-csv-files-with-strings-containing-commas-as-one-field

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!