So I basically have an array of words and phrases. Some of them contain curses. I want to create a method that automatically scans each of the units in the array for curses. If
If it is a rather small list just iterate through it checking each word.
If it is rather large put the "bad words" in an NSOrderedSet
and then use the method: containsObject:
.
If the number of words to be checked is not small you could also put the words to be checked in an NSSet
and the "bad words" in another NSSet
and use the method: intersectsSet:
.
Example:
NSArray *stringsToCheck = @[@"hey how are you", @"what is going on?", @"whats up dude?", @"do you want to get chipotle?"];
NSSet *badWords = [NSSet setWithArray:@[@"how", @"dude", @"yes"]];
for (NSString *line in stringsToCheck) {
NSSet *checkWords = [NSSet setWithArray:[line componentsSeparatedByString:@" "]];
NSLog(@"checkWords: %@", checkWords);
if ([checkWords intersectsSet:badWords]) {
NSLog(@"checkWords contains a bad word in: '%@'", [[checkWords allObjects] componentsJoinedByString:@" "]);
// Now search for the specific bad word if necessary.
}
}
NSLog output:
checkWords contains a bad word in: 'you how are hey'
As you state you are:
appalled that I have not been able to find a method of
NSString
that will search for a bunch of words at the same time
though this seems a strange reaction - programming is about building solutions after all, here is a solution which searches for all the words at the same time using a single method, but belonging to NSRegularExpression
rather than NSString
.
Our sample data:
NSArray *sampleLines = @[@"Hey how are you",
@"What is going on?",
@"What’s up dude?",
@"Do you want to get chipotle?",
@"They are the youth"
];
NSArray *stopWords = @[@"you", @"hey"];
The last sample line to check we don't match partial words. Capitalisation added to test for case insensitive matching.
We construct a RE to match any of the stop words:
\b
- word boundary, options set to use Unicode word boundaries in this example(?: ... )
- a non-capturing group, just used as it is slightly faster than a capturing one and it will be the same as the whole match anyway|
- orPattern for exmaple stop words: \b(?:you|hey)\b
// don't forget to use \\ in a string literal to insert a backslash into the pattern
NSString *pattern = [NSString stringWithFormat:@"\\b(?:%@)\\b", [stopWords componentsJoinedByString:@"|"]];
NSError *error = nil;
NSRegularExpression *stopRE = [NSRegularExpression regularExpressionWithPattern:pattern
options:(NSRegularExpressionCaseInsensitive | NSRegularExpressionUseUnicodeWordBoundaries)
error:&error];
// always check error returns
if (error)
{
NSLog(@"RE construction failed: %@", error);
return;
}
Iterate through sample lines checking if they contain a stop word or not and display result on console:
for (NSString *aLine in sampleLines)
{
// check for all words anywhere in line in one go
NSRange match = [stopRE rangeOfFirstMatchInString:aLine
options:0
range:NSMakeRange(0, aLine.length)];
BOOL containsStopWord = match.location != NSNotFound;
NSLog(@"%@: %@", aLine, containsStopWord ? @"Bad" : @"OK");
}
Regular expression matching should be efficient, and as the example never copies individual words or matches as NSString
objects this should not create a lot of temporary objects as methods which enumerate the individual words do.
HTH
I'd do two nested for-loops. The first loop to scan over the phrase array and the second over the word array. In semi-pseudocode, something like:
NSMutableArray *filtered ... // etc.
// Loop over each phrase.
for (NSString *phrase in phrases) {
// Let's assume it's acceptable
bool good = true;
for (NSString *word in words) {
// If we find a single unwanted word, we'll no longer take it
if ([phrase rangeOfString:word].location != NSNotFound) {
good = false;
break; // We don't need to keep iterating.
// We already know it's not aceptable.
}
}
if (good) [filtered insertObject:phrase];
}
Honestly, I think your problem is that more that you think that because parts of the problem can be glossed over in casual speech that must make it an easy problem. Breaking a sentence into words is hard. Examples:
Words often contain other complete words within them. For example "they" contains "hey". You can't just search for substrings.
American typographical conventions dictate that you don't put spaces around an emdash. So the correctly written sentence is "hey—how are you?". You can't just split on whitespace and/or just remove punctuation.
Diacritics are often optional. Even in American English, a minority of publishers — most notably those of the New Yorker — use a diaresis; it looks like an umlaut but marks the second vowel if two run together in a word. Like coöperate. However in some languages they change the word — in German the umlaut is a pronunciation mark and e.g. differentiates Apfel the singular from Äpfel the plural.
So what exactly would you have Apple add as a simple API-level approach? What should everyone who picked a different option do? It's much smarter to just give you the tools to compose the approach that best suits you.
That all being said, I think the neatest and most compact form of what I think you're describing is:
NSArray *inputSentences =
@[
@"hey how are you",
@"what is going on?",
@"whats up dude?",
@"do you want to get chipotle?"
];
NSArray *forbiddenWords =
@[@"you", @"hey"];
NSSet *forbiddenWordsSet = [NSSet setWithArray:forbiddenWords];
NSCharacterSet *nonLetterSet =
[[NSCharacterSet letterCharacterSet] invertedSet];
NSPredicate *predicate =
[NSPredicate
predicateWithBlock:
^BOOL(NSString *evaluatedObject, NSDictionary *bindings)
{
return ![forbiddenWordsSet intersectsSet:
[NSSet setWithArray:
[evaluatedObject
componentsSeparatedByCharactersInSet:nonLetterSet]]];
}];
NSLog(@"%@", [inputSentences filteredArrayUsingPredicate:predicate]);
Though you might want nonLetterSet to be whitespaceCharacterSet
instead. Judge for yourself.
A predicate is used to automatically filter a set without an explicit loop and manual accumulation. Set intersections are used to avoid a manual internal loop. The only slightly untidy bit is having to use a block predicate as you have to apply preparatory logic.
On the plus side, most of the code is setup. You can create the predicate once, store it somewhere, then apply it to any array or set of strings anywhere in your code with a single one-line call.
As noted by other commenters, this will produce a lot of temporary objects.
I would use a different approach.
I would use the method indexesOfObjectsPassingTest: to scan the array, returning the indexes of the string objects that do not contain your swears. You could then take the resulting NSIndexSet and use it to create a new array with the objects listed (using the method objectsAtIndexes).
You could also use 2 nested loops, as @kevin9794 says, although his code needs some fixes:
NSMutableArray *filtered ... // etc.
// Loop over each phrase.
for (NSString *phrase in phrases) {
BOOL hasSwears = NO;
// Loop over each word
for (NSString *swear in swears)
{
// Do the check. This line will be executed once for combination
// of items in the arrays.
if ([string rangeOfString: swear].location != NSNotFound)
{
hasSwears = YES;
break;
}
}
if (!hasSwears)
[filtered insertObject:phrase];
}
That code should really use the longer form of rangeOfString that lets you specify options, with the option to do a case-insenstive comparison.