soup.find(\"tagName\", { \"id\" : \"articlebody\" })
Why does this NOT return the
tags
The Id
property is always uniquely identified. That means you can use it directly without even specifying the element. Therefore, it is a plus point if your elements have it to parse through the content.
divEle = soup.find(id = "articlebody")
Beautiful Soup 4 supports most CSS selectors with the .select() method, therefore you can use an id selector such as:
soup.select('#articlebody')
If you need to specify the element's type, you can add a type selector before the id
selector:
soup.select('div#articlebody')
The .select()
method will return a collection of elements, which means that it would return the same results as the following .find_all() method example:
soup.find_all('div', id="articlebody")
# or
soup.find_all(id="articlebody")
If you only want to select a single element, then you could just use the .find() method:
soup.find('div', id="articlebody")
# or
soup.find(id="articlebody")
soup.find("tagName",attrs={ "id" : "articlebody" })
from bs4 import BeautifulSoup
from requests_html import HTMLSession
url = 'your_url'
session = HTMLSession()
resp = session.get(url)
# if element with id "articlebody" is dynamic, else need not to render
resp.html.render()
soup = bs(resp.html.html, "lxml")
soup.find("div", {"id": "articlebody"})
Happened to me also while trying to scrape Google.
I ended up using pyquery.
Install:
pip install pyquery
Use:
from pyquery import PyQuery
pq = PyQuery('<html><body><div id="articlebody"> ... </div></body></html')
tag = pq('div#articlebody')
have you tried soup.findAll("div", {"id": "articlebody"})
?
sounds crazy, but if you're scraping stuff from the wild, you can't rule out multiple divs...