Python String Search Regardless Of Character Sequence

后端 未结 2 1266
粉色の甜心
粉色の甜心 2021-01-24 06:52

I want to create an application that checks if word that the user typed in contains word/words from a separate text file (e.g. input = \'teeth\', separate file contains the word

相关标签:
2条回答
  • 2021-01-24 07:46

    I know it's unlikely, but if performance really mattered for very large inputs, you could avoid needing to create the second Counter and just iterate directly over the characters of the substring, allowing the possibility of terminating early if you run out of a given character.

    In [26]: def contains2(string, substring):
        ...:     c = Counter(string)
        ...:     for char in substring:
        ...:         if c[char] > 0:
        ...:             c[char] -= 1
        ...:         else:
        ...:             return False
        ...:     return True
        ...: 
    
    In [27]: contains2("teeth", "eeh")
    Out[27]: True
    
    In [28]: contains2("teeth", "ehe")
    Out[28]: True
    
    In [29]: contains2("teth", "ehe")
    Out[29]: False
    
    In [30]: contains2("teth", "eeh")
    Out[30]: False
    
    In [31]: def contains(string, substring):
        ...:     c1 = collections.Counter(string)
        ...:     c2 = collections.Counter(substring)
        ...:     return not(c2-c1)
        ...: 
    
    In [32]: contains("teth", "ehe")
    Out[32]: False
    
    In [33]: contains("teeth", "ehe")
    Out[33]: True
    
    In [34]: contains("teeth", "eeh")
    Out[34]: True
    
    In [35]: %timeit contains("teeth", "eeh")
    19.6 µs ± 94.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    
    In [36]: %timeit contains2("teeth", "eeh")
    9.59 µs ± 29.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    
    In [37]: %timeit contains("friday is a good day", "ss a")
    22.9 µs ± 121 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
    
    In [38]: %timeit contains2("friday is a good day", "ss a")
    9.52 µs ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
    
    0 讨论(0)
  • 2021-01-24 07:56

    I would create a collections.Counter object from both strings, counting the characters, then substract the dicts, test if resulting dict is empty (which means string contains substring with cardinality respected)

    import collections

    def contains(substring, string):
        c1 = collections.Counter(string)
        c2 = collections.Counter(substring)
        return not(c2-c1)
    
    print(contains("eeh","teeth"))
    print(contains("eeh","teth"))
    

    result:

    True
    False
    

    Note that your example isn't representative as

    >>> "eet" in "teeth"
    True
    

    that's why I changed it.

    0 讨论(0)
提交回复
热议问题