I\'m writing a python regex that looks through a text document for quoted strings (quotes of airline pilots recorded from blackboxes). I started by trying to write a regex with
Read the documentation. re.findall
returns the groups, if there are any. If you want the entire match you must group it all, or use re.finditer
. See this question.
You need to catch everything with an extra pair of parentheses.
re.findall('(("|\').*?\\2)', page)
You aren't capturing anything except for the quotes, which is what Python is returning.
If you add another group, things work much better:
for quote, match in re.finditer(r'("|\')(.*?)\1', page):
print match
I prefixed your string literal with an r
to make it a raw string, which is useful when you need to use a ton of backslashes (\\1
becomes \1
).