How to use regular expression in iPhone app to separate string by , (comma)

自闭症网瘾萝莉.ら 提交于 2019-12-08 18:35:19

问题


I have to read .csv file which has three columns. While parsing the .csv file, I get the string in this format Christopher Bass,\"Cry the Beloved Country Final Essay\",cbass@cgs.k12.va.us. I want to store the values of three columns in an Array, so I used componentSeparatedByString:@"," method! It is successfully returning me the array with three components:

  1. Christopher Bass
  2. Cry the Beloved Country Final Essay
  3. cbass@cgs.k12.va.us

but when there is already a comma in the column value, like this Christopher Bass,\"Cry, the Beloved Country Final Essay\",cbass@cgs.k12.va.us it separates the string in four components because there is a ,(comma) after the Cry:

  1. Christopher Bass
  2. Cry
  3. the Beloved Country Final Essay
  4. cbass@cgs.k12.va.us

so, How can I handle this by using regular expression. I have "RegexKitLite" classes but which regular expression should I use. Please help!

Thanks-


回答1:


Any regular expression would probably turn out with the same problem, what you need is to sanitize your entries or strings, either by escaping your commas or by highlighting strings this way: "My string". Otherwise you will have the same problem. Good luck.

For your example you would probably need to do something like:

\"Christopher Bass\",\"Cry\, the Beloved Country Final Essay\",\"cbass@cgs.k12.va.us\"

That way you could use a regexp or even the same method from the NSString class.

Not related at all, but the importance of sanitizing strings: http://xkcd.com/327/ hehehe.




回答2:


How about this:

componentsSeparatedByRegex:@",\\\"|\\\","

This should split your string whereever " and , appear together in either order, resulting in a three-member array. This of course assumes that the second element in the string is always enclosed in parentheses, and the characters " and , never appear consecutively within the three components.

If either of these assumptions is incorrect, other methods to identify string components may be used, but it should be made clear that no generic solution exists. If the three component strings can contain " and , anywhere, not even a limited solution is possible in such cases:

Doe, John,\"\"Why Unescaped Strings Suck\", And Other Development Horror Stories\",Doe, John <john.doe@dev.null>

Hopefully there is nothing like the above in your CSV data. If there is, the data is basically unusable, and you should look into a better CSV exporter.




回答3:


The regex you're searching for is: \\"(.*)\\"[ ^,]*|([^,]*),

in ObjC: (('\"' && string_1 && '\"' && 0-n spaces) || string_2 except comma) && comma

NSString *str = @"Christopher Bass,\"Cry, the Beloved Country ,Final Essay\",cbass@cgs.k12.va.us,som";
NSString *regEx = @"\\\"(.*)\\\"[ ^,]*|([^,]*),";
NSMutableArray *split = [[str componentsSeparatedByRegex:regEx] mutableCopy];
[split removeObject:@""]; // because it will print always both groups even if the other is empty
NSLog(@"%@", split);

// OUTPUT:
2012-02-07 17:42:18.778 tmpapp[92170:c03] (
    "Christopher Bass",
    "Cry, the Beloved Country ,Final Essay",
    "cbass@cgs.k12.va.us",
    som
)

RegexKitLite will add both strings to the array, therefore you will end up with empty objects for your array. removeObject:@"" will delete those but if you need to maintain true empty values (eg. your source has val,,ue) you have to modify the code to the following:

str = [str stringByReplacingOccurrencesOfRegex:regEx withString:@"$1$2∏"];
NSArray *split = [str componentsSeparatedByString:@"∏"];

$1 and $2 are those two strings mentioned above, ∏ is in this case a character which will most likely never appear in normal text (and is easy to remember: option-shift-p).




回答4:


The last part looks like it will never contain a comma. Neither will the first one as far as I can see...

What about splitting the string like this:

NSArray *splitArr = [str componentsSeparatedByString:@","];
NSString *nameStr = [splitArr objectAtIndex:0];
NSString *emailStr = [splitArr lastObject];

NSString *contentStr = @"";
for(int i=1; i<[splitArr count]-1; ++i) {
    contentStr = [contentStr stringByAppendingString:[splitArr objectAtIndex:i]];
}

This will use the first and last string as is, and combine the rest into the content.

Kind of a hack, but a name and an email address will never contain a comma, right?




回答5:


Is the title guarantied to have the quotation marks? And is it the only component that can have them? Because then componentSeparatedByString:@"\"" should get you this:

  1. Christopher Bass,
  2. Cry, the Beloved Country Final Essay
  3. ,cbass@cgs.k12.va.us

Then use componentSeparatedByString:@"," or substringFrom/ToIndex: to get rid of the two commas in the first and last component.

Here's a solution using substring:

NSString* input = @"Christopher Bass,\"Cry, the Beloved Country Final Essay\",cbass@cgs.k12.va.us";
NSArray* split = [input componentsSeparatedByString:@"\""];
NSString* part1 = [split objectAtIndex:0];
NSString* part2 = [split objectAtIndex:1];
NSString* part3 = [split objectAtIndex:2];
part1 = [part1 substringToIndex:[part1 length] - 1];
part3 = [part3 substringFromIndex:1];

NSLog(part1);
NSLog(part2);
NSLog(part3);


来源:https://stackoverflow.com/questions/9083616/how-to-use-regular-expression-in-iphone-app-to-separate-string-by-comma

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!