How can I check if a string represents an int, without using try/except?

前端 未结 19 1831
悲哀的现实
悲哀的现实 2020-11-22 00:36

Is there any way to tell whether a string represents an integer (e.g., \'3\', \'-17\' but not \'3.14\' or \'asf

相关标签:
19条回答
  • 2020-11-22 00:48

    If you want to accept lower-ascii digits only, here are tests to do so:

    Python 3.7+: (u.isdecimal() and u.isascii())

    Python <= 3.6: (u.isdecimal() and u == str(int(u)))

    Other answers suggest using .isdigit() or .isdecimal() but these both include some upper-unicode characters such as '٢' (u'\u0662'):

    u = u'\u0662'     # '٢'
    u.isdigit()       # True
    u.isdecimal()     # True
    u.isascii()       # False (Python 3.7+ only)
    u == str(int(u))  # False
    
    0 讨论(0)
  • 2020-11-22 00:52

    Use a regular expression:

    import re
    def RepresentsInt(s):
        return re.match(r"[-+]?\d+$", s) is not None
    

    If you must accept decimal fractions also:

    def RepresentsInt(s):
        return re.match(r"[-+]?\d+(\.0*)?$", s) is not None
    

    For improved performance if you're doing this often, compile the regular expression only once using re.compile().

    0 讨论(0)
  • 2020-11-22 00:52

    The proper RegEx solution would combine the ideas of Greg Hewgill and Nowell, but not use a global variable. You can accomplish this by attaching an attribute to the method. Also, I know that it is frowned upon to put imports in a method, but what I'm going for is a "lazy module" effect like http://peak.telecommunity.com/DevCenter/Importing#lazy-imports

    edit: My favorite technique so far is to use exclusively methods of the String object.

    #!/usr/bin/env python
    
    # Uses exclusively methods of the String object
    def isInteger(i):
        i = str(i)
        return i=='0' or (i if i.find('..') > -1 else i.lstrip('-+').rstrip('0').rstrip('.')).isdigit()
    
    # Uses re module for regex
    def isIntegre(i):
        import re
        if not hasattr(isIntegre, '_re'):
            print("I compile only once. Remove this line when you are confident in that.")
            isIntegre._re = re.compile(r"[-+]?\d+(\.0*)?$")
        return isIntegre._re.match(str(i)) is not None
    
    # When executed directly run Unit Tests
    if __name__ == '__main__':
        for obj in [
                    # integers
                    0, 1, -1, 1.0, -1.0,
                    '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0',
                    # non-integers
                    1.1, -1.1, '1.1', '-1.1', '+1.1',
                    '1.1.1', '1.1.0', '1.0.1', '1.0.0',
                    '1.0.', '1..0', '1..',
                    '0.0.', '0..0', '0..',
                    'one', object(), (1,2,3), [1,2,3], {'one':'two'}
                ]:
            # Notice the integre uses 're' (intended to be humorous)
            integer = ('an integer' if isInteger(obj) else 'NOT an integer')
            integre = ('an integre' if isIntegre(obj) else 'NOT an integre')
            # Make strings look like strings in the output
            if isinstance(obj, str):
                obj = ("'%s'" % (obj,))
            print("%30s is %14s is %14s" % (obj, integer, integre))
    

    And for the less adventurous members of the class, here is the output:

    I compile only once. Remove this line when you are confident in that.
                                 0 is     an integer is     an integre
                                 1 is     an integer is     an integre
                                -1 is     an integer is     an integre
                               1.0 is     an integer is     an integre
                              -1.0 is     an integer is     an integre
                               '0' is     an integer is     an integre
                              '0.' is     an integer is     an integre
                             '0.0' is     an integer is     an integre
                               '1' is     an integer is     an integre
                              '-1' is     an integer is     an integre
                              '+1' is     an integer is     an integre
                             '1.0' is     an integer is     an integre
                            '-1.0' is     an integer is     an integre
                            '+1.0' is     an integer is     an integre
                               1.1 is NOT an integer is NOT an integre
                              -1.1 is NOT an integer is NOT an integre
                             '1.1' is NOT an integer is NOT an integre
                            '-1.1' is NOT an integer is NOT an integre
                            '+1.1' is NOT an integer is NOT an integre
                           '1.1.1' is NOT an integer is NOT an integre
                           '1.1.0' is NOT an integer is NOT an integre
                           '1.0.1' is NOT an integer is NOT an integre
                           '1.0.0' is NOT an integer is NOT an integre
                            '1.0.' is NOT an integer is NOT an integre
                            '1..0' is NOT an integer is NOT an integre
                             '1..' is NOT an integer is NOT an integre
                            '0.0.' is NOT an integer is NOT an integre
                            '0..0' is NOT an integer is NOT an integre
                             '0..' is NOT an integer is NOT an integre
                             'one' is NOT an integer is NOT an integre
    <object object at 0x103b7d0a0> is NOT an integer is NOT an integre
                         (1, 2, 3) is NOT an integer is NOT an integre
                         [1, 2, 3] is NOT an integer is NOT an integre
                    {'one': 'two'} is NOT an integer is NOT an integre
    
    0 讨论(0)
  • 2020-11-22 00:53
    >>> "+7".lstrip("-+").isdigit()
    True
    >>> "-7".lstrip("-+").isdigit()
    True
    >>> "7".lstrip("-+").isdigit()
    True
    >>> "13.4".lstrip("-+").isdigit()
    False
    

    So your function would be:

    def is_int(val):
       return val.lstrip("-+").isdigit()
    
    0 讨论(0)
  • 2020-11-22 00:54

    You know, I've found (and I've tested this over and over) that try/except does not perform all that well, for whatever reason. I frequently try several ways of doing things, and I don't think I've ever found a method that uses try/except to perform the best of those tested, in fact it seems to me those methods have usually come out close to the worst, if not the worst. Not in every case, but in many cases. I know a lot of people say it's the "Pythonic" way, but that's one area where I part ways with them. To me, it's neither very performant nor very elegant, so, I tend to only use it for error trapping and reporting.

    I was going to gripe that PHP, perl, ruby, C, and even the freaking shell have simple functions for testing a string for integer-hood, but due diligence in verifying those assumptions tripped me up! Apparently this lack is a common sickness.

    Here's a quick and dirty edit of Bruno's post:

    import sys, time, re
    
    g_intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")
    
    testvals = [
        # integers
        0, 1, -1, 1.0, -1.0,
        '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0', '06',
        # non-integers
        'abc 123',
        1.1, -1.1, '1.1', '-1.1', '+1.1',
        '1.1.1', '1.1.0', '1.0.1', '1.0.0',
        '1.0.', '1..0', '1..',
        '0.0.', '0..0', '0..',
        'one', object(), (1,2,3), [1,2,3], {'one':'two'},
        # with spaces
        ' 0 ', ' 0.', ' .0','.01 '
    ]
    
    def isInt_try(v):
        try:     i = int(v)
        except:  return False
        return True
    
    def isInt_str(v):
        v = str(v).strip()
        return v=='0' or (v if v.find('..') > -1 else v.lstrip('-+').rstrip('0').rstrip('.')).isdigit()
    
    def isInt_re(v):
        import re
        if not hasattr(isInt_re, 'intRegex'):
            isInt_re.intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")
        return isInt_re.intRegex.match(str(v).strip()) is not None
    
    def isInt_re2(v):
        return g_intRegex.match(str(v).strip()) is not None
    
    def check_int(s):
        s = str(s)
        if s[0] in ('-', '+'):
            return s[1:].isdigit()
        return s.isdigit()    
    
    
    def timeFunc(func, times):
        t1 = time.time()
        for n in range(times):
            for v in testvals: 
                r = func(v)
        t2 = time.time()
        return t2 - t1
    
    def testFuncs(funcs):
        for func in funcs:
            sys.stdout.write( "\t%s\t|" % func.__name__)
        print()
        for v in testvals:
            if type(v) == type(''):
                sys.stdout.write("'%s'" % v)
            else:
                sys.stdout.write("%s" % str(v))
            for func in funcs:
                sys.stdout.write( "\t\t%s\t|" % func(v))
            sys.stdout.write("\r\n") 
    
    if __name__ == '__main__':
        print()
        print("tests..")
        testFuncs((isInt_try, isInt_str, isInt_re, isInt_re2, check_int))
        print()
    
        print("timings..")
        print("isInt_try:   %6.4f" % timeFunc(isInt_try, 10000))
        print("isInt_str:   %6.4f" % timeFunc(isInt_str, 10000)) 
        print("isInt_re:    %6.4f" % timeFunc(isInt_re, 10000))
        print("isInt_re2:   %6.4f" % timeFunc(isInt_re2, 10000))
        print("check_int:   %6.4f" % timeFunc(check_int, 10000))
    

    Here are the performance comparison results:

    timings..
    isInt_try:   0.6426
    isInt_str:   0.7382
    isInt_re:    1.1156
    isInt_re2:   0.5344
    check_int:   0.3452
    

    A C method could scan it Once Through, and be done. A C method that scans the string once through would be the Right Thing to do, I think.

    EDIT:

    I've updated the code above to work in Python 3.5, and to include the check_int function from the currently most voted up answer, and to use the current most popular regex that I can find for testing for integer-hood. This regex rejects strings like 'abc 123'. I've added 'abc 123' as a test value.

    It is Very Interesting to me to note, at this point, that NONE of the functions tested, including the try method, the popular check_int function, and the most popular regex for testing for integer-hood, return the correct answers for all of the test values (well, depending on what you think the correct answers are; see the test results below).

    The built-in int() function silently truncates the fractional part of a floating point number and returns the integer part before the decimal, unless the floating point number is first converted to a string.

    The check_int() function returns false for values like 0.0 and 1.0 (which technically are integers) and returns true for values like '06'.

    Here are the current (Python 3.5) test results:

                  isInt_try |       isInt_str       |       isInt_re        |       isInt_re2       |   check_int   |
    0               True    |               True    |               True    |               True    |       True    |
    1               True    |               True    |               True    |               True    |       True    |
    -1              True    |               True    |               True    |               True    |       True    |
    1.0             True    |               True    |               False   |               False   |       False   |
    -1.0            True    |               True    |               False   |               False   |       False   |
    '0'             True    |               True    |               True    |               True    |       True    |
    '0.'            False   |               True    |               False   |               False   |       False   |
    '0.0'           False   |               True    |               False   |               False   |       False   |
    '1'             True    |               True    |               True    |               True    |       True    |
    '-1'            True    |               True    |               True    |               True    |       True    |
    '+1'            True    |               True    |               True    |               True    |       True    |
    '1.0'           False   |               True    |               False   |               False   |       False   |
    '-1.0'          False   |               True    |               False   |               False   |       False   |
    '+1.0'          False   |               True    |               False   |               False   |       False   |
    '06'            True    |               True    |               False   |               False   |       True    |
    'abc 123'       False   |               False   |               False   |               False   |       False   |
    1.1             True    |               False   |               False   |               False   |       False   |
    -1.1            True    |               False   |               False   |               False   |       False   |
    '1.1'           False   |               False   |               False   |               False   |       False   |
    '-1.1'          False   |               False   |               False   |               False   |       False   |
    '+1.1'          False   |               False   |               False   |               False   |       False   |
    '1.1.1'         False   |               False   |               False   |               False   |       False   |
    '1.1.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.1'         False   |               False   |               False   |               False   |       False   |
    '1.0.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.'          False   |               False   |               False   |               False   |       False   |
    '1..0'          False   |               False   |               False   |               False   |       False   |
    '1..'           False   |               False   |               False   |               False   |       False   |
    '0.0.'          False   |               False   |               False   |               False   |       False   |
    '0..0'          False   |               False   |               False   |               False   |       False   |
    '0..'           False   |               False   |               False   |               False   |       False   |
    'one'           False   |               False   |               False   |               False   |       False   |
    <obj..>         False   |               False   |               False   |               False   |       False   |
    (1, 2, 3)       False   |               False   |               False   |               False   |       False   |
    [1, 2, 3]       False   |               False   |               False   |               False   |       False   |
    {'one': 'two'}  False   |               False   |               False   |               False   |       False   |
    ' 0 '           True    |               True    |               True    |               True    |       False   |
    ' 0.'           False   |               True    |               False   |               False   |       False   |
    ' .0'           False   |               False   |               False   |               False   |       False   |
    '.01 '          False   |               False   |               False   |               False   |       False   |
    

    Just now I tried adding this function:

    def isInt_float(s):
        try:
            return float(str(s)).is_integer()
        except:
            return False
    

    It performs almost as well as check_int (0.3486) and it returns true for values like 1.0 and 0.0 and +1.0 and 0. and .0 and so on. But it also returns true for '06', so. Pick your poison, I guess.

    0 讨论(0)
  • 2020-11-22 00:55

    Here is a function that parses without raising errors. It handles obvious cases returns None on failure (handles up to 2000 '-/+' signs by default on CPython!):

    #!/usr/bin/env python
    
    def get_int(number):
        splits = number.split('.')
        if len(splits) > 2:
            # too many splits
            return None
        if len(splits) == 2 and splits[1]:
            # handle decimal part recursively :-)
            if get_int(splits[1]) != 0:
                return None
    
        int_part = splits[0].lstrip("+")
        if int_part.startswith('-'):
            # handle minus sign recursively :-)
            return get_int(int_part[1:]) * -1
        # successful 'and' returns last truth-y value (cast is always valid)
        return int_part.isdigit() and int(int_part)
    

    Some tests:

    tests = ["0", "0.0", "0.1", "1", "1.1", "1.0", "-1", "-1.1", "-1.0", "-0", "--0", "---3", '.3', '--3.', "+13", "+-1.00", "--+123", "-0.000"]
    
    for t in tests:
        print "get_int(%s) = %s" % (t, get_int(str(t)))
    

    Results:

    get_int(0) = 0
    get_int(0.0) = 0
    get_int(0.1) = None
    get_int(1) = 1
    get_int(1.1) = None
    get_int(1.0) = 1
    get_int(-1) = -1
    get_int(-1.1) = None
    get_int(-1.0) = -1
    get_int(-0) = 0
    get_int(--0) = 0
    get_int(---3) = -3
    get_int(.3) = None
    get_int(--3.) = 3
    get_int(+13) = 13
    get_int(+-1.00) = -1
    get_int(--+123) = 123
    get_int(-0.000) = 0
    

    For your needs you can use:

    def int_predicate(number):
         return get_int(number) is not None
    
    0 讨论(0)
提交回复
热议问题