strange behavior of parenthesis in python regex

后端 未结 3 1343
北海茫月
北海茫月 2021-01-28 20:45

I\'m writing a python regex that looks through a text document for quoted strings (quotes of airline pilots recorded from blackboxes). I started by trying to write a regex with

相关标签:
3条回答
  • 2021-01-28 21:33

    Read the documentation. re.findall returns the groups, if there are any. If you want the entire match you must group it all, or use re.finditer. See this question.

    0 讨论(0)
  • 2021-01-28 21:34

    You need to catch everything with an extra pair of parentheses.

    re.findall('(("|\').*?\\2)', page)
    
    0 讨论(0)
  • 2021-01-28 21:35

    You aren't capturing anything except for the quotes, which is what Python is returning.

    If you add another group, things work much better:

    for quote, match in re.finditer(r'("|\')(.*?)\1', page):
      print match
    

    I prefixed your string literal with an r to make it a raw string, which is useful when you need to use a ton of backslashes (\\1 becomes \1).

    0 讨论(0)
提交回复
热议问题