i\'ve crafted this regular expression:
\\\\n<(\\w+)>(.+?)\\w+>\\\\n
to parse the foll
You shouldn't parse XML with regex, instead you should use the Universal Feed Parser for Python. Using this library over regex will make your life easier and has been battle tested to be correct.
I personally have used this library many times, it works like a charm.
Do not try to reinvent wheels or playing the smart RSS parser guy. Reuse existing modules: http://www.feedparser.org/
DON'T PARSE XML/HTML WITH REGEX!
Use one of the following:
Enjoy!
EDIT: Oh yeah it's RSS. What the other people said... I'll be here all week.
Before the regex compiler sees a string, Python has already processed the slash-escapes, therefore you'd have to escape it twice (e.g. \\\\n
for \\n
). However, Python has a handy notation for exactly this sort of thing, just stick an r
before the string:
regex = re.compile(r"""<entry>\\n<(\w+)>(.+?)</\w+>\\n</entry>""")
By the way, I agree with the others here, do not use regexes to parse XML. However, hopefully you will find this string notation helpful in future regular expressions.