Is there any way to tell whether a string represents an integer (e.g., \'3\'
, \'-17\'
but not \'3.14\'
or \'asf
I guess the question is related with speed since the try/except has a time penalty:
First, I created a list of 200 strings, 100 failing strings and 100 numeric strings.
from random import shuffle
numbers = [u'+1'] * 100
nonumbers = [u'1abc'] * 100
testlist = numbers + nonumbers
shuffle(testlist)
testlist = np.array(testlist)
np.core.defchararray.isnumeric can also work with unicode strings np.core.defchararray.isnumeric(u'+12')
but it returns and array. So, it's a good solution if you have to do thousands of conversions and have missing data or non numeric data.
import numpy as np
%timeit np.core.defchararray.isnumeric(testlist)
10000 loops, best of 3: 27.9 µs per loop # 200 numbers per loop
def check_num(s):
try:
int(s)
return True
except:
return False
def check_list(l):
return [check_num(e) for e in l]
%timeit check_list(testlist)
1000 loops, best of 3: 217 µs per loop # 200 numbers per loop
Seems that numpy solution is much faster.
Greg Hewgill's approach was missing a few components: the leading "^" to only match the start of the string, and compiling the re beforehand. But this approach will allow you to avoid a try: exept:
import re
INT_RE = re.compile(r"^[-]?\d+$")
def RepresentsInt(s):
return INT_RE.match(str(s)) is not None
I would be interested why you are trying to avoid try: except?
I think
s.startswith('-') and s[1:].isdigit()
would be better to rewrite to:
s.replace('-', '').isdigit()
because s[1:] also creates a new string
But much better solution is
s.lstrip('+-').isdigit()
I have one possibility that doesn't use int at all, and should not raise an exception unless the string does not represent a number
float(number)==float(number)//1
It should work for any kind of string that float accepts, positive, negative, engineering notation...
I really liked Shavais' post, but I added one more test case ( & the built in isdigit() function):
def isInt_loop(v):
v = str(v).strip()
# swapping '0123456789' for '9876543210' makes nominal difference (might have because '1' is toward the beginning of the string)
numbers = '0123456789'
for i in v:
if i not in numbers:
return False
return True
def isInt_Digit(v):
v = str(v).strip()
return v.isdigit()
and it significantly consistently beats the times of the rest:
timings..
isInt_try: 0.4628
isInt_str: 0.3556
isInt_re: 0.4889
isInt_re2: 0.2726
isInt_loop: 0.1842
isInt_Digit: 0.1577
using normal 2.7 python:
$ python --version
Python 2.7.10
Both the two test cases I added (isInt_loop and isInt_digit) pass the exact same test cases (they both only accept unsigned integers), but I thought that people could be more clever with modifying the string implementation (isInt_loop) opposed to the built in isdigit() function, so I included it, even though there's a slight difference in execution time. (and both methods beat everything else by a lot, but don't handle the extra stuff: "./+/-" )
Also, I did find it interesting to note that the regex (isInt_re2 method) beat the string comparison in the same test that was performed by Shavais in 2012 (currently 2018). Maybe the regex libraries have been improved?
This is probably the most straightforward and pythonic way to approach it in my opinion. I didn't see this solution and it's basically the same as the regex one, but without the regex.
def is_int(test):
import string
return not (set(test) - set(string.digits))