python - regex search and findall

大城市里の小女人 提交于 2019-11-27 15:03:42
aleph_null

Ok, I see what's going on... from the docs:

If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group.

As it turns out, you do have a group, "(\d+,?)"... so, what it's returning is the last occurrence of this group, or 000.

One solution is to surround the entire regex by a group, like this

regex = re.compile('((\d+,?)+)')

then, it will return [('9,000,000', '000')], which is a tuple containing both matched groups. of course, you only care about the first one.

Personally, i would use the following regex

regex = re.compile('((\d+,)*\d+)')

to avoid matching stuff like " this is a bad number 9,123,"

Edit.

Here's a way to avoid having to surround the expression by parenthesis or deal with tuples

s = "..."
regex = re.compile('(\d+,?)+')
it = re.finditer(regex, s)

for match in it:
  print match.group(0)

finditer returns an iterator that you can use to access all the matches found. these match objects are the same that re.search returns, so group(0) returns the result you expect.

Alan Moore

@aleph_null's answer correctly explains what's causing your problem, but I think I have a better solution. Use this regex:

regex = re.compile(r'\d+(?:,\d+)*')

Some reasons why it's better:

  1. (?:...) is a non-capturing group, so you only get the one result for each match.

  2. \d+(?:,\d+)* is a better regex, more efficient and less likely to return false positives.

  3. You should always use Python's raw strings for regexes if possible; you're less likely to be surprised by regex escape sequences (like \b for word boundary) being interpreted as string-literal escape sequences (like \b for backspace).

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!