Extract text between link tags in python using BeautifulSoup

后端 未结 3 742
一向
一向 2021-01-16 00:30

I have an html code like this:

My HomePage

<

相关标签:
3条回答
  • 2021-01-16 00:34

    You can do something like this:

    import BeautifulSoup
    
    html = """
    <html><head></head>
    <body>
    <h2 class='title'><a href='http://www.gurletins.com'>My HomePage</a></h2>
    <h2 class='title'><a href='http://www.gurletins.com/sections'>Sections</a></h2>
    </body>
    </html>
    """
    
    soup = BeautifulSoup.BeautifulSoup(html)
    
    print [elm.a.text for elm in soup.findAll('h2', {'class': 'title'})]
    # Output: [u'My HomePage', u'Sections']
    
    0 讨论(0)
  • 2021-01-16 00:40

    The following code extracts text (link descriptions) between 'a' tags and stores in an array.

    >>> from bs4 import BeautifulSoup
    >>> data = """<h2 class="title"><a href="http://www.gurletins.com">My 
    HomePage</a></h2>
    ...
    ... <h2 class="title"><a href="http://www.gurletins.com/sections">Sections</a>
    </h2>"""
    >>> soup = BeautifulSoup(data, "html.parser")
    >>> reqTxt = soup.find_all("h2", {"class":"title"})
    >>> a = []
    >>> for i in reqTxt:
    ...     a.append(i.get_text())
    ...
    >>> a
    ['My HomePage', 'Sections']
    >>> a[0]
    'My HomePage'
    >>> a[1]
    'Sections'
    
    0 讨论(0)
  • 2021-01-16 00:54

    print [a.findAll(text=True) for a in soup.findAll('a')]

    0 讨论(0)
提交回复
热议问题