BeautifulSoup different parsers

前端 未结 1 899
悲&欢浪女
悲&欢浪女 2021-01-19 23:37

could anyone elaborate more about the difference between parsers like html.parser and html5lib? I\'ve stumbled across a weird behavior where when using html.parser it ignore

相关标签:
1条回答
  • 2021-01-20 00:12

    You can use lxml which is very fast and can use find_all or select to get all tags.

    from bs4 import BeautifulSoup
    html = """
    <html>
    <head></head>
    <body>
    <!--[if lte IE 8]> <!-- data-module-name="test"--> <![endif]-->
     <![endif]-->
        <a href="test"></a>
        <a href="test"></a>
        <a href="test"></a>
        <a href="test"></a>
       <!--[if lte IE 8]>
      <![endif]-->
      </body>
    </html>
    """
    
    soup = BeautifulSoup(html, 'lxml')
    tags = soup.find_all('a')
    print(tags)
    

    OR

    from bs4 import BeautifulSoup
    html = """
    <html>
    <head></head>
    <body>
    <!--[if lte IE 8]> <!-- data-module-name="test"--> <![endif]-->
     <![endif]-->
        <a href="test"></a>
        <a href="test"></a>
        <a href="test"></a>
        <a href="test"></a>
       <!--[if lte IE 8]>
      <![endif]-->
      </body>
    </html>
    """
    
    soup = BeautifulSoup(html, 'lxml')
    tags = soup.select('a')
    print(tags)
    
    0 讨论(0)
提交回复
热议问题