How to obtain the content between a tag and it's ending in HTML using python' beautiful soup?

后端 未结 2 957
广开言路
广开言路 2021-01-24 02:07

I have a HTML line as follows:

Is this model too thin for Yves Saint Laurent? 

I would lik

2条回答
  •  离开以前
    2021-01-24 02:21

    Instead of using regular expressions, you should use some html parser like BeautifulSoup. You can also use etree library with xpath for complicated use cases.

    Still, if you want to use regex -

    Regular Expression is a Domain-Specific Language that makes string parsing and processing a lot more easier. Although, some people may disagree regular expressions provide much elegant solutions to problem, that looping over string could ever be.-

    import re
    html_string = 'Is this model too thin for Yves Saint Laurent? '
    regex = re.compile(r'(?<=>).*(?=<)')
    result = regex.findall(html_string)[0]
    

    In this regex, I am using look-ahead and look-behind of regular expressions. As far as learning regular expressions is concerned, it takes rather considerable amount of time. I recommend going through some good tutorial or some book on regex.

提交回复
热议问题