How can I check if a string represents an int, without using try/except?

前端 未结 19 1893
悲哀的现实
悲哀的现实 2020-11-22 00:36

Is there any way to tell whether a string represents an integer (e.g., \'3\', \'-17\' but not \'3.14\' or \'asf

19条回答
  •  隐瞒了意图╮
    2020-11-22 00:54

    You know, I've found (and I've tested this over and over) that try/except does not perform all that well, for whatever reason. I frequently try several ways of doing things, and I don't think I've ever found a method that uses try/except to perform the best of those tested, in fact it seems to me those methods have usually come out close to the worst, if not the worst. Not in every case, but in many cases. I know a lot of people say it's the "Pythonic" way, but that's one area where I part ways with them. To me, it's neither very performant nor very elegant, so, I tend to only use it for error trapping and reporting.

    I was going to gripe that PHP, perl, ruby, C, and even the freaking shell have simple functions for testing a string for integer-hood, but due diligence in verifying those assumptions tripped me up! Apparently this lack is a common sickness.

    Here's a quick and dirty edit of Bruno's post:

    import sys, time, re
    
    g_intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")
    
    testvals = [
        # integers
        0, 1, -1, 1.0, -1.0,
        '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0', '06',
        # non-integers
        'abc 123',
        1.1, -1.1, '1.1', '-1.1', '+1.1',
        '1.1.1', '1.1.0', '1.0.1', '1.0.0',
        '1.0.', '1..0', '1..',
        '0.0.', '0..0', '0..',
        'one', object(), (1,2,3), [1,2,3], {'one':'two'},
        # with spaces
        ' 0 ', ' 0.', ' .0','.01 '
    ]
    
    def isInt_try(v):
        try:     i = int(v)
        except:  return False
        return True
    
    def isInt_str(v):
        v = str(v).strip()
        return v=='0' or (v if v.find('..') > -1 else v.lstrip('-+').rstrip('0').rstrip('.')).isdigit()
    
    def isInt_re(v):
        import re
        if not hasattr(isInt_re, 'intRegex'):
            isInt_re.intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")
        return isInt_re.intRegex.match(str(v).strip()) is not None
    
    def isInt_re2(v):
        return g_intRegex.match(str(v).strip()) is not None
    
    def check_int(s):
        s = str(s)
        if s[0] in ('-', '+'):
            return s[1:].isdigit()
        return s.isdigit()    
    
    
    def timeFunc(func, times):
        t1 = time.time()
        for n in range(times):
            for v in testvals: 
                r = func(v)
        t2 = time.time()
        return t2 - t1
    
    def testFuncs(funcs):
        for func in funcs:
            sys.stdout.write( "\t%s\t|" % func.__name__)
        print()
        for v in testvals:
            if type(v) == type(''):
                sys.stdout.write("'%s'" % v)
            else:
                sys.stdout.write("%s" % str(v))
            for func in funcs:
                sys.stdout.write( "\t\t%s\t|" % func(v))
            sys.stdout.write("\r\n") 
    
    if __name__ == '__main__':
        print()
        print("tests..")
        testFuncs((isInt_try, isInt_str, isInt_re, isInt_re2, check_int))
        print()
    
        print("timings..")
        print("isInt_try:   %6.4f" % timeFunc(isInt_try, 10000))
        print("isInt_str:   %6.4f" % timeFunc(isInt_str, 10000)) 
        print("isInt_re:    %6.4f" % timeFunc(isInt_re, 10000))
        print("isInt_re2:   %6.4f" % timeFunc(isInt_re2, 10000))
        print("check_int:   %6.4f" % timeFunc(check_int, 10000))
    

    Here are the performance comparison results:

    timings..
    isInt_try:   0.6426
    isInt_str:   0.7382
    isInt_re:    1.1156
    isInt_re2:   0.5344
    check_int:   0.3452
    

    A C method could scan it Once Through, and be done. A C method that scans the string once through would be the Right Thing to do, I think.

    EDIT:

    I've updated the code above to work in Python 3.5, and to include the check_int function from the currently most voted up answer, and to use the current most popular regex that I can find for testing for integer-hood. This regex rejects strings like 'abc 123'. I've added 'abc 123' as a test value.

    It is Very Interesting to me to note, at this point, that NONE of the functions tested, including the try method, the popular check_int function, and the most popular regex for testing for integer-hood, return the correct answers for all of the test values (well, depending on what you think the correct answers are; see the test results below).

    The built-in int() function silently truncates the fractional part of a floating point number and returns the integer part before the decimal, unless the floating point number is first converted to a string.

    The check_int() function returns false for values like 0.0 and 1.0 (which technically are integers) and returns true for values like '06'.

    Here are the current (Python 3.5) test results:

                  isInt_try |       isInt_str       |       isInt_re        |       isInt_re2       |   check_int   |
    0               True    |               True    |               True    |               True    |       True    |
    1               True    |               True    |               True    |               True    |       True    |
    -1              True    |               True    |               True    |               True    |       True    |
    1.0             True    |               True    |               False   |               False   |       False   |
    -1.0            True    |               True    |               False   |               False   |       False   |
    '0'             True    |               True    |               True    |               True    |       True    |
    '0.'            False   |               True    |               False   |               False   |       False   |
    '0.0'           False   |               True    |               False   |               False   |       False   |
    '1'             True    |               True    |               True    |               True    |       True    |
    '-1'            True    |               True    |               True    |               True    |       True    |
    '+1'            True    |               True    |               True    |               True    |       True    |
    '1.0'           False   |               True    |               False   |               False   |       False   |
    '-1.0'          False   |               True    |               False   |               False   |       False   |
    '+1.0'          False   |               True    |               False   |               False   |       False   |
    '06'            True    |               True    |               False   |               False   |       True    |
    'abc 123'       False   |               False   |               False   |               False   |       False   |
    1.1             True    |               False   |               False   |               False   |       False   |
    -1.1            True    |               False   |               False   |               False   |       False   |
    '1.1'           False   |               False   |               False   |               False   |       False   |
    '-1.1'          False   |               False   |               False   |               False   |       False   |
    '+1.1'          False   |               False   |               False   |               False   |       False   |
    '1.1.1'         False   |               False   |               False   |               False   |       False   |
    '1.1.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.1'         False   |               False   |               False   |               False   |       False   |
    '1.0.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.'          False   |               False   |               False   |               False   |       False   |
    '1..0'          False   |               False   |               False   |               False   |       False   |
    '1..'           False   |               False   |               False   |               False   |       False   |
    '0.0.'          False   |               False   |               False   |               False   |       False   |
    '0..0'          False   |               False   |               False   |               False   |       False   |
    '0..'           False   |               False   |               False   |               False   |       False   |
    'one'           False   |               False   |               False   |               False   |       False   |
             False   |               False   |               False   |               False   |       False   |
    (1, 2, 3)       False   |               False   |               False   |               False   |       False   |
    [1, 2, 3]       False   |               False   |               False   |               False   |       False   |
    {'one': 'two'}  False   |               False   |               False   |               False   |       False   |
    ' 0 '           True    |               True    |               True    |               True    |       False   |
    ' 0.'           False   |               True    |               False   |               False   |       False   |
    ' .0'           False   |               False   |               False   |               False   |       False   |
    '.01 '          False   |               False   |               False   |               False   |       False   |
    

    Just now I tried adding this function:

    def isInt_float(s):
        try:
            return float(str(s)).is_integer()
        except:
            return False
    

    It performs almost as well as check_int (0.3486) and it returns true for values like 1.0 and 0.0 and +1.0 and 0. and .0 and so on. But it also returns true for '06', so. Pick your poison, I guess.

提交回复
热议问题