efficiently checking that string consists of one character in Python

后端 未结 8 1255
太阳男子
太阳男子 2020-11-29 01:03

What is an efficient way to check that a string s in Python consists of just one character, say \'A\'? Something like all_equal(s, \'A\')

相关标签:
8条回答
  • 2020-11-29 01:53
    >>> s = 'AAAAAAAAAAAAAAAAAAA'
    >>> s.count(s[0]) == len(s)
    True
    

    This doesn't short circuit. A version which does short-circuit would be:

    >>> all(x == s[0] for x in s)
    True
    

    However, I have a feeling that due the the optimized C implementation, the non-short circuiting version will probably perform better on some strings (depending on size, etc)


    Here's a simple timeit script to test some of the other options posted:

    import timeit
    import re
    
    def test_regex(s,regex=re.compile(r'^(.)\1*$')):
        return bool(regex.match(s))
    
    def test_all(s):
        return all(x == s[0] for x in s)
    
    def test_count(s):
        return s.count(s[0]) == len(s)
    
    def test_set(s):
        return len(set(s)) == 1
    
    def test_replace(s):
        return not s.replace(s[0],'')
    
    def test_translate(s):
        return not s.translate(None,s[0])
    
    def test_strmul(s):
        return s == s[0]*len(s)
    
    tests = ('test_all','test_count','test_set','test_replace','test_translate','test_strmul','test_regex')
    
    print "WITH ALL EQUAL"
    for test in tests:
        print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="AAAAAAAAAAAAAAAAA"'%test)
        if globals()[test]("AAAAAAAAAAAAAAAAA") != True:
            print globals()[test]("AAAAAAAAAAAAAAAAA")
            raise AssertionError
    
    print
    print "WITH FIRST NON-EQUAL"
    for test in tests:
        print test, timeit.timeit('%s(s)'%test,'from __main__ import %s; s="FAAAAAAAAAAAAAAAA"'%test)
        if globals()[test]("FAAAAAAAAAAAAAAAA") != False:
            print globals()[test]("FAAAAAAAAAAAAAAAA")
            raise AssertionError
    

    On my machine (OS-X 10.5.8, core2duo, python2.7.3) with these contrived (short) strings, str.count smokes set and all, and beats str.replace by a little, but is edged out by str.translate and strmul is currently in the lead by a good margin:

    WITH ALL EQUAL
    test_all 5.83863711357
    test_count 0.947771072388
    test_set 2.01028490067
    test_replace 1.24682998657
    test_translate 0.941282987595
    test_strmul 0.629556179047
    test_regex 2.52913498878
    
    WITH FIRST NON-EQUAL
    test_all 2.41147494316
    test_count 0.942595005035
    test_set 2.00480484962
    test_replace 0.960338115692
    test_translate 0.924381017685
    test_strmul 0.622269153595
    test_regex 1.36632800102
    

    The timings could be slightly (or even significantly?) different between different systems and with different strings, so that would be worth looking into with an actual string you're planning on passing.

    Eventually, if you hit the best case for all enough, and your strings are long enough, you might want to consider that one. It's a better algorithm ... I would avoid the set solution though as I don't see any case where it could possibly beat out the count solution.

    If memory could be an issue, you'll need to avoid str.translate, str.replace and strmul as those create a second string, but this isn't usually a concern these days.

    0 讨论(0)
  • 2020-11-29 01:56
    not len("AAAAAAAAA".replace('A', ''))
    
    0 讨论(0)
  • 2020-11-29 02:00

    Interesting answers so far. Here's another:

    flag = True
    for c in 'AAAAAAAfAAAA':
        if not c == 'A': 
            flag = False
            break
    

    The only advantage I can think of to mine is that it doesn't need to traverse the entire string if it finds an inconsistent character.

    0 讨论(0)
  • 2020-11-29 02:01

    If you need to check if all the characters in the string are same and is equal to a given character, you need to remove all duplicates and check if the final result equals the single character.

    >>> set("AAAAA") == set("A")
    True
    

    In case you desire to find if there is any duplicate, just check the length

    >>> len(set("AAAAA")) == 1
    True
    
    0 讨论(0)
  • 2020-11-29 02:05

    This is by far the fastest, several times faster than even count(), just time it with that excellent mgilson's timing suite:

    s == len(s) * s[0]
    

    Here all the checking is done inside the Python C code which just:

    • allocates len(s) characters;
    • fills the space with the first character;
    • compares two strings.

    The longer the string is, the greater is time bonus. However, as mgilson writes, it creates a copy of the string, so if your string length is many millions of symbols, it may become a problem.

    As we can see from timing results, generally the fastest ways to solve the task do not execute any Python code for each symbol. However, the set() solution also does all the job inside C code of the Python library, but it is still slow, probably because of operating string through Python object interface.

    UPD: Concerning the empty string case. What to do with it strongly depends on the task. If the task is "check if all the symbols in a string are the same", s == len(s) * s[0] is a valid answer (no symbols mean an error, and exception is ok). If the task is "check if there is exactly one unique symbol", empty string should give us False, and the answer is s and s == len(s) * s[0], or bool(s) and s == len(s) * s[0] if you prefer receiving boolean values. Finally, if we understand the task as "check if there are no different symbols", the result for empty string is True, and the answer is not s or s == len(s) * s[0].

    0 讨论(0)
  • 2020-11-29 02:06

    You could convert to a set and check there is only one member:

    len(set("AAAAAAAA"))
    
    0 讨论(0)
提交回复
热议问题