Find substring in string but only if whole words?

前端 未结 7 692
囚心锁ツ
囚心锁ツ 2020-11-27 07:32

What is an elegant way to look for a string within another string in Python, but only if the substring is within whole words, not part of a word?

Perhaps an example

相关标签:
7条回答
  • 2020-11-27 08:11

    Excuse me REGEX fellows, but the simpler answer is:

    text = "this is the esquisidiest piece never ever writen"
    word = "is"
    " {0} ".format(text).lower().count(" {0} ".format(word).lower())
    

    The trick here is to add 2 spaces surrounding the 'text' and the 'word' to be searched, so you guarantee there will be returning only counts for the whole word and you don't get troubles with endings and beginnings of the 'text' searched.

    0 讨论(0)
  • 2020-11-27 08:14

    I'm building off this answer.

    The problem with the above code is that it will return false when there are multiple occurrences of needle in haystack, with the second occurrence satisfying the search criteria but not the first.

    Here's my version:

    def find_substring(needle, haystack):
      search_start = 0
      while (search_start < len(haystack)):
        index = haystack.find(needle, search_start)
        if index == -1:
          return False
        is_prefix_whitespace = (index == 0 or haystack[index-1] in string.whitespace)
        search_start = index + len(needle)
        is_suffix_whitespace = (search_start == len(haystack) or haystack[search_start] in string.whitespace)
        if (is_prefix_whitespace and is_suffix_whitespace):
          return True
      return False
    

    Hope that helps!

    0 讨论(0)
  • 2020-11-27 08:17

    You can use regular expressions and the word boundary special character \b (highlight by me):

    Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that \b is defined as the boundary between \w and \W, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. Inside a character range, \b represents the backspace character, for compatibility with Python’s string literals.

    def string_found(string1, string2):
       if re.search(r"\b" + re.escape(string1) + r"\b", string2):
          return True
       return False
    

    Demo


    If word boundaries are only whitespaces for you, you could also get away with pre- and appending whitespaces to your strings:

    def string_found(string1, string2):
       string1 = " " + string1.strip() + " "
       string2 = " " + string2.strip() + " "
       return string2.find(string1)
    
    0 讨论(0)
  • 2020-11-27 08:20

    The simplest and most pythonic way, I believe, is to break the strings down into individual words and scan for a match:

    
        string = "My Name Is Josh"
        substring = "Name"
    
        for word in string.split():
            if substring == word:
                print("Match Found")
    
    

    For a bonus, here's a oneliner:

    any([substring == word for word in string.split()])
    
    0 讨论(0)
  • 2020-11-27 08:27
    def string_found(string1,string2):
        if string2 in string1 and string2[string2.index(string1)-1]==" 
        " and string2[string2.index(string1)+len(string1)]==" ":return True
        elif string2.index(string1)+len(string1)==len(string2) and 
        string2[string2.index(string1)-1]==" ":return True
        else:return False
    
    0 讨论(0)
  • 2020-11-27 08:34

    One approach using the re, or regex, module that should accomplish this task is:

    import re
    
    string1 = "pizza pony"
    string2 = "who knows what a pizza pony is?"
    
    search_result = re.search(r'\b' + string1 + '\W', string2)
    
    print(search_result.group())
    
    0 讨论(0)
提交回复
热议问题