Removing all non-numeric characters from string in Python

后端未结

关注

 7  1032

遥遥无期

How do we remove all non-numeric characters from a string in Python?

相关标签:

7条回答

南方客

2020-11-29 17:49
Not sure if this is the most efficient way, but:
```
>>> ''.join(c for c in "abc123def456" if c.isdigit())
'123456'
```
The ''.join part means to combine all the resulting characters together without any characters in between. Then the rest of it is a list comprehension, where (as you can probably guess) we only take the parts of the string that match the condition isdigit.
0 讨论(0)
发布评论:

提交评论
- 加载中...

野趣味

2020-11-29 17:55

Fastest approach, if you need to perform more than just one or two such removal operations (or even just one, but on a very long string!-), is to rely on the translate method of strings, even though it does need some prep:

>>> import string
>>> allchars = ''.join(chr(i) for i in xrange(256))
>>> identity = string.maketrans('', '')
>>> nondigits = allchars.translate(identity, string.digits)
>>> s = 'abc123def456'
>>> s.translate(identity, nondigits)
'123456'

The translate method is different, and maybe a tad simpler simpler to use, on Unicode strings than it is on byte strings, btw:

>>> unondig = dict.fromkeys(xrange(65536))
>>> for x in string.digits: del unondig[ord(x)]
... 
>>> s = u'abc123def456'
>>> s.translate(unondig)
u'123456'

You might want to use a mapping class rather than an actual dict, especially if your Unicode string may potentially contain characters with very high ord values (that would make the dict excessively large;-). For example:

>>> class keeponly(object):
...   def __init__(self, keep): 
...     self.keep = set(ord(c) for c in keep)
...   def __getitem__(self, key):
...     if key in self.keep:
...       return key
...     return None
... 
>>> s.translate(keeponly(string.digits))
u'123456'
>>>

0 讨论(0)

一向

2020-11-29 17:57
Many right answers but in case you want it in a float, directly, without using regex:
```
x= '$123.45M'

float(''.join(c for c in x if (c.isdigit() or c =='.'))
```
123.45

You can change the point for a comma depending on your needs.

change for this if you know your number is an integer
```
x='$1123'    
int(''.join(c for c in x if c.isdigit())
```
1123
0 讨论(0)
发布评论:

提交评论
- 加载中...

天涯浪人

2020-11-29 18:01

>>> import re
>>> re.sub("[^0-9]", "", "sdkjh987978asd098as0980a98sd")
'987978098098098'

0 讨论(0)

不知归路

2020-11-29 18:01

This should work for both strings and unicode objects in Python2, and both strings and bytes in Python3:

# python <3.0
def only_numerics(seq):
    return filter(type(seq).isdigit, seq)

# python ≥3.0
def only_numerics(seq):
    seq_type= type(seq)
    return seq_type().join(filter(seq_type.isdigit, seq))

0 讨论(0)

有刺的猬

2020-11-29 18:02
Just to add another option to the mix, there are several useful constants within the string module. While more useful in other cases, they can be used here.
```
>>> from string import digits
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'
```
There are several constants in the module, including:
- ascii_letters (abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ)
- hexdigits (0123456789abcdefABCDEF)
If you are using these constants heavily, it can be worthwhile to covert them to a frozenset. That enables O(1) lookups, rather than O(n), where n is the length of the constant for the original strings.
```
>>> digits = frozenset(digits)
>>> ''.join(c for c in "abc123def456" if c in digits)
'123456'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页