Fastest way to check if a string contains specific characters in any of the items in a list

前端 未结 4 1076
隐瞒了意图╮
隐瞒了意图╮ 2021-01-02 04:33

What is the fastest way to check if a string contains some characters from any items of a list?

Currently, I\'m using this method:

lestring = \"Text1         


        
相关标签:
4条回答
  • 2021-01-02 05:01

    if the test is to see if there are any characters in common (not words or segments), create a set out of the letters in the list and then check the letters agains the string:

    char_list = set(''.join(list_of_words))
    test_set = set(string_to_teat)
    common_chars = char_list.intersection(test_set)
    

    However I'm assuming you're looking for as little as one character in common...

    0 讨论(0)
  • 2021-01-02 05:02

    The esmre library does the trick. In your case, the simpler, esm (part of esmre) is what you want.

    https://pypi.python.org/pypi/esmre/

    https://code.google.com/p/esmre/

    They have good documentation and examples: Taken from their examples:

    >>> import esm
    >>> index = esm.Index()
    >>> index.enter("he")
    >>> index.enter("she")
    >>> index.enter("his")
    >>> index.enter("hers")
    >>> index.fix()
    >>> index.query("this here is history")
    [((1, 4), 'his'), ((5, 7), 'he'), ((13, 16), 'his')]
    >>> index.query("Those are his sheep!")
    [((10, 13), 'his'), ((14, 17), 'she'), ((15, 17), 'he')]
    >>> 
    

    I ran some performance tests:

    import random, timeit, string, esm
    
    def uz(lelist, lestring):
        for x in lelist:
            if lestring.count(x):
                return 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)
    
    
    
    def ab(lelist, lestring):
        return [e for e in lelist if e in lestring]
    
    
    def use_esm(index, lestring):
        return index.query(lestring)
    
    for TEXT_LEN in [5, 50, 1000]:
        for SEARCH_LEN in [5, 20]:
            for N in [5, 50, 1000, 10000]:
                if TEXT_LEN < SEARCH_LEN:
                    continue
    
                print 'TEXT_LEN:', TEXT_LEN, 'SEARCH_LEN:', SEARCH_LEN, 'N:', N
    
                lestring = ''.join((random.choice(string.ascii_uppercase + string.digits) for _ in range(TEXT_LEN)))
                lelist = [''.join((random.choice(string.ascii_uppercase + string.digits) for _ in range(SEARCH_LEN))) for _
                          in range(N)]
    
                index = esm.Index()
                for i in lelist:
                    index.enter(i)
                index.fix()
    
                t_ab = timeit.Timer("ab(lelist, lestring)", setup="from __main__ import lelist, lestring, ab")
                t_uz = timeit.Timer("uz(lelist, lestring)", setup="from __main__ import lelist, lestring, uz")
                t_esm = timeit.Timer("use_esm(index, lestring)", setup="from __main__ import index, lestring, use_esm")
    
                ab_time = t_ab.timeit(1000)
                uz_time = t_uz.timeit(1000)
                esm_time = t_esm.timeit(1000)
    
                min_time = min(ab_time, uz_time, esm_time)
                print '  ab%s: %f' % ('*' if ab_time == min_time else '', ab_time)
                print '  uz%s: %f' % ('*' if uz_time == min_time else '', uz_time)
                print '  esm%s %f:' % ('*' if esm_time == min_time else '', esm_time)
    

    And got that results depends mostly on the number of items that one is looking for (in my case, 'N'):

    TEXT_LEN: 1000 SEARCH_LEN: 20 N: 5
      ab*: 0.001733
      uz: 0.002512
      esm 0.126853:
    
    TEXT_LEN: 1000 SEARCH_LEN: 20 N: 50
      ab*: 0.017564
      uz: 0.023701
      esm 0.079925:
    
    TEXT_LEN: 1000 SEARCH_LEN: 20 N: 1000
      ab: 0.370371
      uz: 0.489523
      esm* 0.133783:
    
    TEXT_LEN: 1000 SEARCH_LEN: 20 N: 10000
      ab: 3.678790
      uz: 4.883575
      esm* 0.259605:
    
    0 讨论(0)
  • 2021-01-02 05:15

    You can try list comprehension with membership check

    >>> lestring = "Text123"
    >>> lelist = ["Text", "foo", "bar"]
    >>> [e for e in lelist if e in lestring]
    ['Text']
    

    Compared to your implementation, though LC has an implicit loop but its faster as there is no explicit function call as in your case with count

    Compared to Joe's implementation, yours is way faster, as the filter function would require to call two functions in a loop, lambda and count

    >>> def joe(lelist, lestring):
        return ''.join(random.sample(x + 'b'*len(x), len(x)))
    
    >>> def uz(lelist, lestring):
        for x in lelist:
            if lestring.count(x):
                return 'Yep. "%s" contains characters from "%s" item.' % (lestring, x)
    
    
    >>> def ab(lelist, lestring):
        return [e for e in lelist if e in lestring]
    
    >>> t_ab = timeit.Timer("ab(lelist, lestring)", setup="from __main__ import lelist, lestring, ab")
    >>> t_uz = timeit.Timer("uz(lelist, lestring)", setup="from __main__ import lelist, lestring, uz")
    >>> t_joe = timeit.Timer("joe(lelist, lestring)", setup="from __main__ import lelist, lestring, joe")
    >>> t_ab.timeit(100000)
    0.09391469893125759
    >>> t_uz.timeit(100000)
    0.1528471407273173
    >>> t_joe.timeit(100000)
    1.4272649857800843
    

    Jamie's commented solution is slower for shorter string's. Here is the test result

    >>> def jamie(lelist, lestring):
        return next(itertools.chain((e for e in lelist if e in lestring), (None,))) is not None
    
    >>> t_jamie = timeit.Timer("jamie(lelist, lestring)", setup="from __main__ import lelist, lestring, jamie")
    >>> t_jamie.timeit(100000)
    0.22237164127909637
    

    If you need Boolean values, for shorter strings, just modify the above LC expression

    [e in lestring for e in lelist if e in lestring]
    

    Or for longer strings, you can do the following

    >>> next(e in lestring for e in lelist if e in lestring)
    True
    

    or

    >>> any(e in lestring for e in lelist)
    
    0 讨论(0)
  • 2021-01-02 05:16
    filter(lambda x: lestring.count(x), lelist)
    

    That will return all the strings that you're trying to find as a list.

    0 讨论(0)
提交回复
热议问题