问题
I'm working on a regex to to collect some values from a page through some script. I'm using re.match
in condition but it returns false but if i use finditer
it returns true and body of condition is executed. i tested that regex in my own built tester and it's working there but not in script.
here is sample script.
result = []
RE_Add0 = re.compile("\d{5}(?:(?:-| |)\d{4})?", re.IGNORECASE)
each = ''Expiration Date:\n05/31/1996\nBusiness Address: 23901 CALABASAS ROAD #2000 CALABASAS, CA 91302\n'
if RE_Add0.match(each):
result0 = RE_Add0.match(each).group(0)
print result0
if len(result0) < 100:
result.append(result0)
else:
print 'Address ignore'
else:
None
回答1:
re.finditer()
returns an iterator object even if there is no match (so an if RE_Add0.finditer(each)
would always return True
). You have to actually iterate over the object to see if there are actual matches.
Then, re.match()
only matches at the beginning of the string, not anywhere in the string as re.search()
or re.finditer()
do.
Third, that regex could be written as r"\d{5}(?:[ -]?\d{4})"
.
Fourth, always use raw strings with regexes.
回答2:
re.match matches at the beginning of a string only once. re.finditer
is similar to re.search
in this regard, i.e., it matches iteratively. Compare:
>>> re.match('a', 'abc')
<_sre.SRE_Match object at 0x01057AA0>
>>> re.match('b', 'abc')
>>> re.finditer('a', 'abc')
<callable_iterator object at 0x0106AD30>
>>> re.finditer('b', 'abc')
<callable_iterator object at 0x0106EA10>
ETA: Since you're mentioning page, I can only surmise that you're talking about html parsing, if that is the case, use BeautifulSoup or a similar html parser. Don't use regex.
回答3:
Try this:
import re
postalCode = re.compile(r'((\d{5})([ -])?(\d{4})?(\s*))$')
primaryGroup = lambda x: x[1]
sampleStr = """
Expiration Date:
05/31/1996
Business Address: 23901 CALABASAS ROAD #2000 CALABASAS, CA 91302
"""
result = []
matches = list(re.findall(postalCode, sampleStr))
if matches:
for n,match in enumerate(matches):
pc = primaryGroup(match)
print pc
result.append(pc)
else:
print "No postal code found in this string"
This returns '12345' on any of
12345\n
12345 \n
12345 6789\n
12345 6789 \n
12345 \n
12345 \n
12345-6789\n
12345-6789 \n
12345-\n
12345- \n
123456789\n
123456789 \n
12345\n
12345 \n
I have it matching only at the end of a line, because otherwise it was also matching '23901' (from the street address) in your example.
来源:https://stackoverflow.com/questions/4646904/different-behavior-when-using-re-finditer-and-re-match