I\'ve been trying to match the following string:
string = \"TEMPLATES = ( (\'index.html\', \'home\'), (\'base.html\', \'base\'))\"
But unfo
If your strings look like valid Python code anyways you can do this:
import ast
var, s = [part.strip() for part in
"TEMPLATES = ( ('index.html', 'home'), ('base.html', 'base'))".split('=')]
result= ast.literal_eval(s)
In case you want to validate that parentheses are balanced up two levels deep, you can use this regular expression:
import re;
string = """( ('index.html', 'home'), ('base.html', 'base'))
('index.html', 'home')
('base.html', 'base')
"""
pattern = re.compile(r"(?P<expression>\(([^()]*(?P<parenthesis>\()(?(parenthesis)[^()]*\)))*?[^()]*\))")
match = pattern.findall(string)
print(match[0][0])
print(match[1][0])
print(match[2][0])
This regular expression uses conditional statement (?(parenthesis)[^()]*\))
.
Demo: https://repl.it/@Konard/ParenthesesExample
Try this:
import re
w = "TEMPLATES = ( ('index.html', 'home'), ('base.html', 'base'))"
# find outer parens
outer = re.compile("\((.+)\)")
m = outer.search(w)
inner_str = m.group(1)
# find inner pairs
innerre = re.compile("\('([^']+)', '([^']+)'\)")
results = innerre.findall(inner_str)
for x,y in results:
print("%s <-> %s" % (x,y))
Output:
index.html <-> home
base.html <-> base
Explanation:
outer
matches the first-starting group of parentheses using \(
and \)
; by default search
finds the longest match, giving us the outermost ( )
pair. The match m
contains exactly what's between those outer parentheses; its content corresponds to the .+
bit of outer
.
innerre
matches exactly one of your ('a', 'b')
pairs, again using \(
and \)
to match the content parens in your input string, and using two groups inside the ' '
to match the strings inside of those single quotes.
Then, we use findall
(rather than search
or match
) to get all matches for innerre
(rather than just one). At this point results
is a list of pairs, as demonstrated by the print loop.
Update: To match the whole thing, you could try something like this:
rx = re.compile("^TEMPLATES = \(.+\)")
rx.match(w)
Your sample is looking for open paren followed by zero or more letter w followed by close paren. You probably want to use \w instead of w, but that won't work in your case anyway, because you have non-word characters next to the open paren.
I think you should consider splitting the string at the commas instead. What is your final objective?
First of all, using \(
isn't enough to match a parenthesis. Python normally reacts to some escape sequences in its strings, which is why it interprets \(
as simple (
. You would either have to write \\(
or use a raw string, e.g. r'\('
or r"\("
.
Second, when you use re.match
, you are anchoring the regex search to the start of the string. If you want to look for the pattern anywhere in the string, use re.search
.
Like Joseph said in his answer, it's not exactly clear what you want to find. For example:
string = "TEMPLATES = ( ('index.html', 'home'), ('base.html', 'base'))"
print re.findall(r'\([^()]*\)', string)
will print
["('index.html', 'home')", "('base.html', 'base')"]
EDIT:
I stand corrected, @phooji is right: escaping is irrelevant in this specific case. But re.match
vs. re.search
or re.findall
is still important.
Better use proper parsing module like pyparsing here.