Python re - escape coincidental parentheses in regex pattern

后端 未结 1 1159
北海茫月
北海茫月 2020-12-22 09:55

I am having trouble with the regex in the following code:

import mechanize
import re

br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [(         


        
相关标签:
1条回答
  • 2020-12-22 10:06

    You need to escape the parenthesis since they have a special meaning:

    <a href="javascript:__doPostBack\('(.*?)','(.*?)'\)">
                                 HERE^            HERE^
    

    Note that, ideally, you should not be parsing HTML with regex (even though your pattern is quite specific and I don't think this is that bad). Instead, parse HTML with, say, BeautifulSoup, locate the a element, get the href attribute value and then extract the desired substrings with regex.

    0 讨论(0)
提交回复
热议问题