问题
I am using the following regex to obtain all data from a website Javascript data source that is contained within the following character pattern
[[]]);
The code I am using is this:
regex = r'\[\[.*?\]]);'
match2 = re.findall(regex, response.body, re.S)
print match2
This is throwing up an error message of:
raise error, v # invalid expression
sre_constants.error: unbalanced parenthesis
I think I am fairly safe in assuming that this is being caused by the closing bracket within my regex. How can I define the regex that I want without getting this error?
Thanks
回答1:
You need to escape those last parenthesis as well. Close square brackets outside a character class do not have to be escaped:
regex = r'\[\[.*?]]\);'
^
If you are trying to obtain the content between the square brackets, use a capturing group here.
>>> import re
>>> s = 'foo [[bar]]); baz [[quz]]); not [[foobar]]'
>>> matches = re.findall(r'\[\[(.*?)]]\);', s, re.S)
>>> matches
['bar', 'quz']
回答2:
escape the last )
and ]
r'\[\[.*?\]\]\)
回答3:
Your regex should be,
regex = r'\[\[.*?\]\]\);'
It would match literal [[
symbols and the following characters upto the next ]]);
symbols.
Explanation:
\[\[
Matches the Literal[[
symbols..*?
Matches any charcter zero or more times.?
after*
forces the regex engine to does a shortest (non-greedy) match.\]\]\);
Matches the literal]]);
symbols.
来源:https://stackoverflow.com/questions/25108542/unbalanced-parenthesis-error-with-regex