How would you scan an array of strings for a set of substrings in objective-c?

后端 未结 5 1754
情歌与酒
情歌与酒 2021-01-27 03:21

So I basically have an array of words and phrases. Some of them contain curses. I want to create a method that automatically scans each of the units in the array for curses. If

相关标签:
5条回答
  • 2021-01-27 03:22

    If it is a rather small list just iterate through it checking each word.

    If it is rather large put the "bad words" in an NSOrderedSet and then use the method: containsObject:.

    If the number of words to be checked is not small you could also put the words to be checked in an NSSet and the "bad words" in another NSSet and use the method: intersectsSet:.

    Example:

    NSArray *stringsToCheck  = @[@"hey how are you", @"what is going on?", @"whats up dude?", @"do you want to get chipotle?"];
    NSSet *badWords = [NSSet setWithArray:@[@"how", @"dude", @"yes"]];
    for (NSString *line in stringsToCheck) {
        NSSet *checkWords = [NSSet setWithArray:[line componentsSeparatedByString:@" "]];
        NSLog(@"checkWords: %@", checkWords);
    
        if ([checkWords intersectsSet:badWords]) {
            NSLog(@"checkWords contains a bad word in: '%@'", [[checkWords allObjects] componentsJoinedByString:@" "]);
            // Now search for the specific bad word if necessary.
        }
    }
    

    NSLog output:
    checkWords contains a bad word in: 'you how are hey'

    0 讨论(0)
  • 2021-01-27 03:27

    As you state you are:

    appalled that I have not been able to find a method of NSString that will search for a bunch of words at the same time

    though this seems a strange reaction - programming is about building solutions after all, here is a solution which searches for all the words at the same time using a single method, but belonging to NSRegularExpression rather than NSString.

    Our sample data:

    NSArray *sampleLines = @[@"Hey how are you",
                             @"What is going on?",
                             @"What’s up dude?",
                             @"Do you want to get chipotle?",
                             @"They are the youth"
                             ];
    NSArray *stopWords = @[@"you", @"hey"];
    

    The last sample line to check we don't match partial words. Capitalisation added to test for case insensitive matching.

    We construct a RE to match any of the stop words:

    • \b - word boundary, options set to use Unicode word boundaries in this example
    • (?: ... ) - a non-capturing group, just used as it is slightly faster than a capturing one and it will be the same as the whole match anyway
    • | - or

    Pattern for exmaple stop words: \b(?:you|hey)\b

    // don't forget to use \\ in a string literal to insert a backslash into the pattern
    NSString *pattern = [NSString stringWithFormat:@"\\b(?:%@)\\b", [stopWords componentsJoinedByString:@"|"]];
    NSError *error = nil;
    NSRegularExpression *stopRE = [NSRegularExpression regularExpressionWithPattern:pattern
                                                                            options:(NSRegularExpressionCaseInsensitive | NSRegularExpressionUseUnicodeWordBoundaries)
                                                                              error:&error];
    // always check error returns
    if (error)
    {
        NSLog(@"RE construction failed: %@", error);
        return;
    }
    

    Iterate through sample lines checking if they contain a stop word or not and display result on console:

    for (NSString *aLine in sampleLines)
    {
        // check for all words anywhere in line in one go
        NSRange match = [stopRE rangeOfFirstMatchInString:aLine
                                                  options:0
                                                    range:NSMakeRange(0, aLine.length)];
        BOOL containsStopWord = match.location != NSNotFound;
        NSLog(@"%@: %@", aLine, containsStopWord ? @"Bad" : @"OK");
    }
    

    Regular expression matching should be efficient, and as the example never copies individual words or matches as NSString objects this should not create a lot of temporary objects as methods which enumerate the individual words do.

    HTH

    0 讨论(0)
  • 2021-01-27 03:35

    I'd do two nested for-loops. The first loop to scan over the phrase array and the second over the word array. In semi-pseudocode, something like:

    NSMutableArray *filtered ... // etc.
    // Loop over each phrase.
    for (NSString *phrase in phrases) {
    
        // Let's assume it's acceptable
        bool good = true;
    
        for (NSString *word in words) {
    
            // If we find a single unwanted word, we'll no longer take it
            if ([phrase rangeOfString:word].location != NSNotFound) {
                good = false;
    
                break; // We don't need to keep iterating. 
                       // We already know it's not aceptable.
            }
        }
    
        if (good) [filtered insertObject:phrase];
    
    }
    
    0 讨论(0)
  • 2021-01-27 03:41

    Honestly, I think your problem is that more that you think that because parts of the problem can be glossed over in casual speech that must make it an easy problem. Breaking a sentence into words is hard. Examples:

    Words often contain other complete words within them. For example "they" contains "hey". You can't just search for substrings.

    American typographical conventions dictate that you don't put spaces around an emdash. So the correctly written sentence is "hey—how are you?". You can't just split on whitespace and/or just remove punctuation.

    Diacritics are often optional. Even in American English, a minority of publishers — most notably those of the New Yorker — use a diaresis; it looks like an umlaut but marks the second vowel if two run together in a word. Like coöperate. However in some languages they change the word — in German the umlaut is a pronunciation mark and e.g. differentiates Apfel the singular from Äpfel the plural.

    So what exactly would you have Apple add as a simple API-level approach? What should everyone who picked a different option do? It's much smarter to just give you the tools to compose the approach that best suits you.

    That all being said, I think the neatest and most compact form of what I think you're describing is:

        NSArray *inputSentences =
            @[
                @"hey how are you",
                @"what is going on?",
                @"whats up dude?",
                @"do you want to get chipotle?"
            ];
        NSArray *forbiddenWords =
            @[@"you", @"hey"];
    
        NSSet *forbiddenWordsSet = [NSSet setWithArray:forbiddenWords];
        NSCharacterSet *nonLetterSet = 
                     [[NSCharacterSet letterCharacterSet] invertedSet];
    
        NSPredicate *predicate =
            [NSPredicate 
                predicateWithBlock:
                    ^BOOL(NSString *evaluatedObject, NSDictionary *bindings)
                    {
                        return ![forbiddenWordsSet intersectsSet:
                                 [NSSet setWithArray:
                                   [evaluatedObject 
                            componentsSeparatedByCharactersInSet:nonLetterSet]]];
                    }];
    
        NSLog(@"%@", [inputSentences filteredArrayUsingPredicate:predicate]);
    

    Though you might want nonLetterSet to be whitespaceCharacterSet instead. Judge for yourself.

    A predicate is used to automatically filter a set without an explicit loop and manual accumulation. Set intersections are used to avoid a manual internal loop. The only slightly untidy bit is having to use a block predicate as you have to apply preparatory logic.

    On the plus side, most of the code is setup. You can create the predicate once, store it somewhere, then apply it to any array or set of strings anywhere in your code with a single one-line call.

    As noted by other commenters, this will produce a lot of temporary objects.

    0 讨论(0)
  • 2021-01-27 03:49

    I would use a different approach.

    I would use the method indexesOfObjectsPassingTest: to scan the array, returning the indexes of the string objects that do not contain your swears. You could then take the resulting NSIndexSet and use it to create a new array with the objects listed (using the method objectsAtIndexes).

    You could also use 2 nested loops, as @kevin9794 says, although his code needs some fixes:

    NSMutableArray *filtered ... // etc.
    // Loop over each phrase.
    for (NSString *phrase in phrases) {
      BOOL hasSwears = NO;
    
      // Loop over each word
    
      for (NSString *swear in swears) 
      {
    
        // Do the check. This line will be executed once for combination
        // of items in the arrays.
        if ([string rangeOfString: swear].location != NSNotFound) 
        {
          hasSwears = YES;
          break;
        }
      }
      if (!hasSwears)
        [filtered insertObject:phrase];
    }
    

    That code should really use the longer form of rangeOfString that lets you specify options, with the option to do a case-insenstive comparison.

    0 讨论(0)
提交回复
热议问题