Removing all non-numeric characters from string in Python

后端 未结 7 1043
遥遥无期
遥遥无期 2020-11-29 17:42

How do we remove all non-numeric characters from a string in Python?

相关标签:
7条回答
  • 2020-11-29 17:49

    Not sure if this is the most efficient way, but:

    >>> ''.join(c for c in "abc123def456" if c.isdigit())
    '123456'
    

    The ''.join part means to combine all the resulting characters together without any characters in between. Then the rest of it is a list comprehension, where (as you can probably guess) we only take the parts of the string that match the condition isdigit.

    0 讨论(0)
  • 2020-11-29 17:55

    Fastest approach, if you need to perform more than just one or two such removal operations (or even just one, but on a very long string!-), is to rely on the translate method of strings, even though it does need some prep:

    >>> import string
    >>> allchars = ''.join(chr(i) for i in xrange(256))
    >>> identity = string.maketrans('', '')
    >>> nondigits = allchars.translate(identity, string.digits)
    >>> s = 'abc123def456'
    >>> s.translate(identity, nondigits)
    '123456'
    

    The translate method is different, and maybe a tad simpler simpler to use, on Unicode strings than it is on byte strings, btw:

    >>> unondig = dict.fromkeys(xrange(65536))
    >>> for x in string.digits: del unondig[ord(x)]
    ... 
    >>> s = u'abc123def456'
    >>> s.translate(unondig)
    u'123456'
    

    You might want to use a mapping class rather than an actual dict, especially if your Unicode string may potentially contain characters with very high ord values (that would make the dict excessively large;-). For example:

    >>> class keeponly(object):
    ...   def __init__(self, keep): 
    ...     self.keep = set(ord(c) for c in keep)
    ...   def __getitem__(self, key):
    ...     if key in self.keep:
    ...       return key
    ...     return None
    ... 
    >>> s.translate(keeponly(string.digits))
    u'123456'
    >>> 
    
    0 讨论(0)
  • 2020-11-29 17:57

    Many right answers but in case you want it in a float, directly, without using regex:

    x= '$123.45M'
    
    float(''.join(c for c in x if (c.isdigit() or c =='.'))
    

    123.45

    You can change the point for a comma depending on your needs.

    change for this if you know your number is an integer

    x='$1123'    
    int(''.join(c for c in x if c.isdigit())
    

    1123

    0 讨论(0)
  • 2020-11-29 18:01
    >>> import re
    >>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
    '987978098098098'
    
    0 讨论(0)
  • 2020-11-29 18:01

    This should work for both strings and unicode objects in Python2, and both strings and bytes in Python3:

    # python <3.0
    def only_numerics(seq):
        return filter(type(seq).isdigit, seq)
    
    # python ≥3.0
    def only_numerics(seq):
        seq_type= type(seq)
        return seq_type().join(filter(seq_type.isdigit, seq))
    
    0 讨论(0)
  • 2020-11-29 18:02

    Just to add another option to the mix, there are several useful constants within the string module. While more useful in other cases, they can be used here.

    >>> from string import digits
    >>> ''.join(c for c in "abc123def456" if c in digits)
    '123456'
    

    There are several constants in the module, including:

    • ascii_letters (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
    • hexdigits (0123456789abcdefABCDEF)

    If you are using these constants heavily, it can be worthwhile to covert them to a frozenset. That enables O(1) lookups, rather than O(n), where n is the length of the constant for the original strings.

    >>> digits = frozenset(digits)
    >>> ''.join(c for c in "abc123def456" if c in digits)
    '123456'
    
    0 讨论(0)
提交回复
热议问题