I am trying to scrape the people who have birthdays from this Wikipedia page
Here is the existing code:
hdr = {\'User-Agent\': \'Mozilla/5.0\'}
site
The idea is to get the span
with Births
id, find parent's next sibling (which is ul
) and iterate over it's li
elements. Here's a complete example using requests
(it's not relevant though):
from bs4 import BeautifulSoup as Soup, Tag
import requests
response = requests.get("http://en.wikipedia.org/wiki/January_1")
soup = Soup(response.content)
births_span = soup.find("span", {"id": "Births"})
births_ul = births_span.parent.find_next_sibling()
for item in births_ul.findAll('li'):
if isinstance(item, Tag):
print item.text
prints:
871 – Zwentibold, Frankish son of Arnulf of Carinthia (d. 900)
1431 – Pope Alexander VI (d. 1503)
1449 – Lorenzo de' Medici, Italian politician (d. 1492)
1467 – Sigismund I the Old, Polish king (d. 1548)
1484 – Huldrych Zwingli, Swiss pastor and theologian (d. 1531)
1511 – Henry, Duke of Cornwall (d. 1511)
1516 – Margaret Leijonhufvud, Swedish wife of Gustav I of Sweden (d. 1551)
...
Hope that helps.
Find the Births section:
section = soup.find('span', id='Births').parent
And then find the next unordered list:
births = section.find_next('ul').find_all('li')