问题
I have strings containing numbers with their units, e.g. 2GB, 17ft, etc. I would like to separate the number from the unit and create 2 different strings. Sometimes, there is a whitespace between them (e.g. 2 GB) and it's easy to do it using split(' ').
When they are together (e.g. 2GB), I would test every character until I find a letter, instead of a number.
s='17GB'
number=''
unit=''
for c in s:
if c.isdigit():
number+=c
else:
unit+=c
Is there a better way to do it?
Thanks
回答1:
s='17GB'
for i,c in enumerate(s):
if not c.isdigit():
break
number=int(s[:i])
unit=s[i:]
回答2:
You can break out of the loop when you find the first non-digit character
for i,c in enumerate(s):
if not c.isdigit():
break
number = s[:i]
unit = s[i:].lstrip()
If you have negative and decimals:
numeric = '0123456789-.'
for i,c in enumerate(s):
if c not in numeric:
break
number = s[:i]
unit = s[i:].lstrip()
回答3:
You could use a regular expression to divide the string into groups:
>>> import re
>>> p = re.compile('(\d+)\s*(\w+)')
>>> p.match('2GB').groups()
('2', 'GB')
>>> p.match('17 ft').groups()
('17', 'ft')
回答4:
tokenize can help:
>>> import StringIO
>>> s = StringIO.StringIO('27GB')
>>> for token in tokenize.generate_tokens(s.readline):
... print token
...
(2, '27', (1, 0), (1, 2), '27GB')
(1, 'GB', (1, 2), (1, 4), '27GB')
(0, '', (2, 0), (2, 0), '')
回答5:
You should use regular expressions, grouping together what you want to find out:
import re
s = "17GB"
match = re.match(r"^([1-9][0-9]*)\s*(GB|MB|KB|B)$", s)
if match:
print "Number: %d, unit: %s" % (int(match.group(1)), match.group(2))
Change the regex according to what you want to parse. If you're unfamiliar with regular expressions, here's a great tutorial site.
回答6:
>>> s="17GB"
>>> ind=map(str.isalpha,s).index(True)
>>> num,suffix=s[:ind],s[ind:]
>>> print num+":"+suffix
17:GB
回答7:
This uses an approach which should be a bit more forgiving than regexes. Note: this is not as performant as the other solutions posted.
def split_units(value):
"""
>>> split_units("2GB")
(2.0, 'GB')
>>> split_units("17 ft")
(17.0, 'ft')
>>> split_units(" 3.4e-27 frobnitzem ")
(3.4e-27, 'frobnitzem')
>>> split_units("9001")
(9001.0, '')
>>> split_units("spam sandwhiches")
(0, 'spam sandwhiches')
>>> split_units("")
(0, '')
"""
units = ""
number = 0
while value:
try:
number = float(value)
break
except ValueError:
units = value[-1:] + units
value = value[:-1]
return number, units.strip()
回答8:
How about using a regular expression
http://python.org/doc/1.6/lib/module-regsub.html
回答9:
For this task, I would definitely use a regular expression:
import re
there = re.compile(r'\s*(\d+)\s*(\S+)')
thematch = there.match(s)
if thematch:
number, unit = thematch.groups()
else:
raise ValueError('String %r not in the expected format' % s)
In the RE pattern, \s
means "whitespace", \d
means "digit", \S
means non-whitespace; *
means "0 or more of the preceding", +
means "1 or more of the preceding, and the parentheses enclose "capturing groups" which are then returned by the groups()
call on the match-object. (thematch
is None if the given string doesn't correspond to the pattern: optional whitespace, then one or more digits, then optional whitespace, then one or more non-whitespace characters).
回答10:
A regular expression.
import re
m = re.match(r'\s*(?P<n>[-+]?[.0-9])\s*(?P<u>.*)', s)
if m is None:
raise ValueError("not a number with units")
number = m.group("n")
unit = m.group("u")
This will give you a number (integer or fixed point; too hard to disambiguate scientific notation's "e" from a unit prefix) with an optional sign, followed by the units, with optional whitespace.
You can use re.compile()
if you're going to be doing a lot of matches.
回答11:
SCIENTIFIC NOTATION This regex is working well for me to parse numbers that may be in scientific notation, and is based on the recent python documentation about scanf: https://docs.python.org/3/library/re.html#simulating-scanf
units_pattern = re.compile("([-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?|\s*[a-zA-Z]+\s*$)")
number_with_units = list(match.group(0) for match in units_pattern.finditer("+2.0e-1 mm"))
print(number_with_units)
>>>['+2.0e-1', ' mm']
n, u = number_with_units
print(float(n), u.strip())
>>>0.2 mm
回答12:
try the regex pattern below. the first group (the scanf() tokens for a number any which way) is lifted directly from the python docs for the re module.
import re
SCANF_MEASUREMENT = re.compile(
r'''( # group match like scanf() token %e, %E, %f, %g
[-+]? # +/- or nothing for positive
(\d+(\.\d*)?|\.\d+) # match numbers: 1, 1., 1.1, .1
([eE][-+]?\d+)? # scientific notation: e(+/-)2 (*10^2)
)
(\s*) # separator: white space or nothing
( # unit of measure: like GB. also works for no units
\S*)''', re.VERBOSE)
'''
:var SCANF_MEASUREMENT:
regular expression object that will match a measurement
**measurement** is the value of a quantity of something. most complicated example::
-666.6e-100 units
'''
def parse_measurement(value_sep_units):
measurement = re.match(SCANF_MEASUREMENT, value_sep_units)
try:
value = float(measurement[0])
except ValueError:
print 'doesn't start with a number', value_sep_units
units = measurement[5]
return value, units
来源:https://stackoverflow.com/questions/2240303/separate-number-from-unit-in-a-string-in-python