Beautiful Soup: Accessing
  • elements from
      with no id
  • 后端 未结 2 763
    遥遥无期
    遥遥无期 2021-01-15 14:32

    I am trying to scrape the people who have birthdays from this Wikipedia page

    Here is the existing code:

    hdr = {\'User-Agent\': \'Mozilla/5.0\'}
    site          
    
    
            
    相关标签:
    2条回答
    • 2021-01-15 15:08

      The idea is to get the span with Births id, find parent's next sibling (which is ul) and iterate over it's li elements. Here's a complete example using requests (it's not relevant though):

      from bs4 import BeautifulSoup as Soup, Tag
      
      import requests
      
      
      response = requests.get("http://en.wikipedia.org/wiki/January_1")
      soup = Soup(response.content)
      
      births_span = soup.find("span", {"id": "Births"})
      births_ul = births_span.parent.find_next_sibling()
      
      for item in births_ul.findAll('li'):
          if isinstance(item, Tag):
              print item.text
      

      prints:

      871 – Zwentibold, Frankish son of Arnulf of Carinthia (d. 900)
      1431 – Pope Alexander VI (d. 1503)
      1449 – Lorenzo de' Medici, Italian politician (d. 1492)
      1467 – Sigismund I the Old, Polish king (d. 1548)
      1484 – Huldrych Zwingli, Swiss pastor and theologian (d. 1531)
      1511 – Henry, Duke of Cornwall (d. 1511)
      1516 – Margaret Leijonhufvud, Swedish wife of Gustav I of Sweden (d. 1551)
      ...
      

      Hope that helps.

      0 讨论(0)
    • 2021-01-15 15:11

      Find the Births section:

      section = soup.find('span', id='Births').parent
      

      And then find the next unordered list:

      births = section.find_next('ul').find_all('li')
      
      0 讨论(0)
    提交回复
    热议问题