How to find elements by class

后端 未结 17 1448
有刺的猬
有刺的猬 2020-11-22 08:33

I\'m having trouble parsing HTML elements with \"class\" attribute using Beautifulsoup. The code looks like this

soup = BeautifulSoup(sdata)
mydivs = soup.fi         


        
相关标签:
17条回答
  • 2020-11-22 09:11

    This works for me to access the class attribute (on beautifulsoup 4, contrary to what the documentation says). The KeyError comes a list being returned not a dictionary.

    for hit in soup.findAll(name='span'):
        print hit.contents[1]['class']
    
    0 讨论(0)
  • 2020-11-22 09:12

    From the documentation:

    As of Beautiful Soup 4.1.2, you can search by CSS class using the keyword argument class_:

    soup.find_all("a", class_="sister")
    

    Which in this case would be:

    soup.find_all("div", class_="stylelistrow")
    

    It would also work for:

    soup.find_all("div", class_="stylelistrowone stylelistrowtwo")
    
    0 讨论(0)
  • 2020-11-22 09:13

    Alternatively we can use lxml, it support xpath and very fast!

    from lxml import html, etree 
    
    attr = html.fromstring(html_text)#passing the raw html
    handles = attr.xpath('//div[@class="stylelistrow"]')#xpath exresssion to find that specific class
    
    for each in handles:
        print(etree.tostring(each))#printing the html as string
    
    0 讨论(0)
  • Concerning @Wernight's comment on the top answer about partial matching...

    You can partially match:

    • <div class="stylelistrow"> and
    • <div class="stylelistrow button">

    with gazpacho:

    from gazpacho import Soup
    
    my_divs = soup.find("div", {"class": "stylelistrow"}, partial=True)
    

    Both will be captured and returned as a list of Soup objects.

    0 讨论(0)
  • 2020-11-22 09:15

    Other answers did not work for me.

    In other answers the findAll is being used on the soup object itself, but I needed a way to do a find by class name on objects inside a specific element extracted from the object I obtained after doing findAll.

    If you are trying to do a search inside nested HTML elements to get objects by class name, try below -

    # parse html
    page_soup = soup(web_page.read(), "html.parser")
    
    # filter out items matching class name
    all_songs = page_soup.findAll("li", "song_item")
    
    # traverse through all_songs
    for song in all_songs:
    
        # get text out of span element matching class 'song_name'
        # doing a 'find' by class name within a specific song element taken out of 'all_songs' collection
        song.find("span", "song_name").text
    

    Points to note:

    1. I'm not explicitly defining the search to be on 'class' attribute findAll("li", {"class": "song_item"}), since it's the only attribute I'm searching on and it will by default search for class attribute if you don't exclusively tell which attribute you want to find on.

    2. When you do a findAll or find, the resulting object is of class bs4.element.ResultSet which is a subclass of list. You can utilize all methods of ResultSet, inside any number of nested elements (as long as they are of type ResultSet) to do a find or find all.

    3. My BS4 version - 4.9.1, Python version - 3.8.1

    0 讨论(0)
提交回复
热议问题