How to extract numbers from a string in Python?

后端 未结 17 2081
星月不相逢
星月不相逢 2020-11-21 05:19

I would extract all the numbers contained in a string. Which is the better suited for the purpose, regular expressions or the isdigit() method?

Example:

相关标签:
17条回答
  • 2020-11-21 05:40
    line2 = "hello 12 hi 89"
    temp1 = re.findall(r'\d+', line2) # through regular expression
    res2 = list(map(int, temp1))
    print(res2)
    

    Hi ,

    you can search all the integers in the string through digit by using findall expression .

    In the second step create a list res2 and add the digits found in string to this list

    hope this helps

    Regards, Diwakar Sharma

    0 讨论(0)
  • 2020-11-21 05:41

    This is more than a bit late, but you can extend the regex expression to account for scientific notation too.

    import re
    
    # Format is [(<string>, <expected output>), ...]
    ss = [("apple-12.34 ba33na fanc-14.23e-2yapple+45e5+67.56E+3",
           ['-12.34', '33', '-14.23e-2', '+45e5', '+67.56E+3']),
          ('hello X42 I\'m a Y-32.35 string Z30',
           ['42', '-32.35', '30']),
          ('he33llo 42 I\'m a 32 string -30', 
           ['33', '42', '32', '-30']),
          ('h3110 23 cat 444.4 rabbit 11 2 dog', 
           ['3110', '23', '444.4', '11', '2']),
          ('hello 12 hi 89', 
           ['12', '89']),
          ('4', 
           ['4']),
          ('I like 74,600 commas not,500', 
           ['74,600', '500']),
          ('I like bad math 1+2=.001', 
           ['1', '+2', '.001'])]
    
    for s, r in ss:
        rr = re.findall("[-+]?[.]?[\d]+(?:,\d\d\d)*[\.]?\d*(?:[eE][-+]?\d+)?", s)
        if rr == r:
            print('GOOD')
        else:
            print('WRONG', rr, 'should be', r)
    

    Gives all good!

    Additionally, you can look at the AWS Glue built-in regex

    0 讨论(0)
  • 2020-11-21 05:47

    If you know it will be only one number in the string, i.e 'hello 12 hi', you can try filter.

    For example:

    In [1]: int(''.join(filter(str.isdigit, '200 grams')))
    Out[1]: 200
    In [2]: int(''.join(filter(str.isdigit, 'Counters: 55')))
    Out[2]: 55
    In [3]: int(''.join(filter(str.isdigit, 'more than 23 times')))
    Out[3]: 23
    

    But be carefull !!! :

    In [4]: int(''.join(filter(str.isdigit, '200 grams 5')))
    Out[4]: 2005
    
    0 讨论(0)
  • 2020-11-21 05:48

    Since none of these dealt with real world financial numbers in excel and word docs that I needed to find, here is my variation. It handles ints, floats, negative numbers, currency numbers (because it doesn't reply on split), and has the option to drop the decimal part and just return ints, or return everything.

    It also handles Indian Laks number system where commas appear irregularly, not every 3 numbers apart.

    It does not handle scientific notation or negative numbers put inside parentheses in budgets -- will appear positive.

    It also does not extract dates. There are better ways for finding dates in strings.

    import re
    def find_numbers(string, ints=True):            
        numexp = re.compile(r'[-]?\d[\d,]*[\.]?[\d{2}]*') #optional - in front
        numbers = numexp.findall(string)    
        numbers = [x.replace(',','') for x in numbers]
        if ints is True:
            return [int(x.replace(',','').split('.')[0]) for x in numbers]            
        else:
            return numbers
    
    0 讨论(0)
  • 2020-11-21 05:52

    If you only want to extract only positive integers, try the following:

    >>> str = "h3110 23 cat 444.4 rabbit 11 2 dog"
    >>> [int(s) for s in str.split() if s.isdigit()]
    [23, 11, 2]
    

    I would argue that this is better than the regex example because you don't need another module and it's more readable because you don't need to parse (and learn) the regex mini-language.

    This will not recognize floats, negative integers, or integers in hexadecimal format. If you can't accept these limitations, jmnas's answer below will do the trick.

    0 讨论(0)
  • 2020-11-21 05:53

    I'm assuming you want floats not just integers so I'd do something like this:

    l = []
    for t in s.split():
        try:
            l.append(float(t))
        except ValueError:
            pass
    

    Note that some of the other solutions posted here don't work with negative numbers:

    >>> re.findall(r'\b\d+\b', 'he33llo 42 I\'m a 32 string -30')
    ['42', '32', '30']
    
    >>> '-3'.isdigit()
    False
    
    0 讨论(0)
提交回复
热议问题