Find specific comments in HTML code using python

后端 未结 1 380
有刺的猬
有刺的猬 2021-01-15 06:14

I cant find a specific comment in python, in example the . My main reason is to find all the links inside 2 specific comments. Something like

相关标签:
1条回答
  • 2021-01-15 06:43

    If you want all the comments, you can use findAll with a callable:

    >>> from bs4 import BeautifulSoup, Comment
    >>> 
    >>> s = """
    ... <p>header</p>
    ... <!-- why -->
    ... www.test1.com
    ... www.test2.org
    ... <!-- why not -->
    ... <p>tail</p>
    ... """
    >>> 
    >>> soup = BeautifulSoup(s)
    >>> comments = soup.findAll(text = lambda text: isinstance(text, Comment))
    >>> 
    >>> comments
    [u' why ', u' why not ']
    

    And once you've got them, you can use the usual tricks to move around:

    >>> comments[0].next
    u'\nwww.test1.com\nwww.test2.org\n'
    >>> comments[0].next.split()
    [u'www.test1.com', u'www.test2.org']
    

    Depending on what the page actually looks like, you may have to tweak it a bit, and you'll have to choose which comments you want, but that should work to get you started.

    Edit:

    If you really want only the ones which look like some specific text, you can do something like

    >>> comments = soup.findAll(text = lambda text: isinstance(text, Comment) and text.strip() == 'why')
    >>> comments
    [u' why ']
    

    or you could filter them after the fact using a list comprehension:

    >>> [c for c in comments if c.strip().startswith("why")]
    [u' why ', u' why not ']
    
    0 讨论(0)
提交回复
热议问题