问题
I'm using re.findall() to extract some version numbers from an HTML file:
>>> import re
>>> text = "<table><td><a href=\"url\">Test0.2.1.zip</a></td><td>Test0.2.1</td></table> Test0.2.1"
>>> re.findall("Test([\.0-9]*)", text)
['0.2.1.', '0.2.1', '0.2.1']
but I would like to only get the ones that do not end in a dot. The filename might not always be .zip so I can't just stick .zip in the regex.
I wanna end up with:
['0.2.1', '0.2.1']
Can anyone suggest a better regex to use? :)
回答1:
re.findall(r"Test([0-9.]*[0-9]+)", text)
or, a bit shorter:
re.findall(r"Test([\d.]*\d+)", text)
By the way - you must not escape the dot in a character class:
[\.0-9] // matches: 0 1 2 3 4 5 6 7 8 9 . \
[.0-9] // matches: 0 1 2 3 4 5 6 7 8 9 .
来源:https://stackoverflow.com/questions/356483/python-regex-findall-numbers-and-dots