different behavior when using re.finditer and re.match

非 Y 不嫁゛ 提交于 2019-12-11 13:24:36

问题


I'm working on a regex to to collect some values from a page through some script. I'm using re.match in condition but it returns false but if i use finditer it returns true and body of condition is executed. i tested that regex in my own built tester and it's working there but not in script. here is sample script.

result = []
RE_Add0 = re.compile("\d{5}(?:(?:-| |)\d{4})?", re.IGNORECASE)
each = ''Expiration Date:\n05/31/1996\nBusiness Address: 23901 CALABASAS ROAD #2000 CALABASAS, CA 91302\n'
if RE_Add0.match(each):
    result0 = RE_Add0.match(each).group(0)
    print result0
    if len(result0) < 100:
        result.append(result0)
    else:
        print 'Address ignore'
else:
    None

回答1:


re.finditer() returns an iterator object even if there is no match (so an if RE_Add0.finditer(each) would always return True). You have to actually iterate over the object to see if there are actual matches.

Then, re.match() only matches at the beginning of the string, not anywhere in the string as re.search() or re.finditer() do.

Third, that regex could be written as r"\d{5}(?:[ -]?\d{4})".

Fourth, always use raw strings with regexes.




回答2:


re.match matches at the beginning of a string only once. re.finditer is similar to re.search in this regard, i.e., it matches iteratively. Compare:

>>> re.match('a', 'abc')
<_sre.SRE_Match object at 0x01057AA0>
>>> re.match('b', 'abc')
>>> re.finditer('a', 'abc')
<callable_iterator object at 0x0106AD30>
>>> re.finditer('b', 'abc')
<callable_iterator object at 0x0106EA10>

ETA: Since you're mentioning page, I can only surmise that you're talking about html parsing, if that is the case, use BeautifulSoup or a similar html parser. Don't use regex.




回答3:


Try this:

import re

postalCode = re.compile(r'((\d{5})([ -])?(\d{4})?(\s*))$')
primaryGroup = lambda x: x[1]

sampleStr = """
    Expiration Date:
    05/31/1996
    Business Address: 23901 CALABASAS ROAD #2000 CALABASAS, CA 91302  
"""
result = []

matches = list(re.findall(postalCode, sampleStr))
if matches:
    for n,match in enumerate(matches): 
        pc = primaryGroup(match)
        print pc
        result.append(pc)
else:
    print "No postal code found in this string"

This returns '12345' on any of

12345\n
12345  \n
12345 6789\n
12345 6789    \n
12345 \n
12345     \n
12345-6789\n
12345-6789    \n
12345-\n
12345-    \n
123456789\n
123456789    \n
12345\n
12345    \n

I have it matching only at the end of a line, because otherwise it was also matching '23901' (from the street address) in your example.



来源:https://stackoverflow.com/questions/4646904/different-behavior-when-using-re-finditer-and-re-match

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!