Beautiful Soup and extracting a div and its contents by ID

后端未结

关注

 13  1451

soup.find(\"tagName\", { \"id\" : \"articlebody\" })

Why does this NOT return the

...

相关标签:

13条回答

星月不相逢

2020-11-30 20:22
The Id property is always uniquely identified. That means you can use it directly without even specifying the element. Therefore, it is a plus point if your elements have it to parse through the content.
```
divEle = soup.find(id = "articlebody")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2020-11-30 20:25
Beautiful Soup 4 supports most CSS selectors with the .select() method, therefore you can use an id selector such as:
```
soup.select('#articlebody')
```
If you need to specify the element's type, you can add a type selector before the id selector:
```
soup.select('div#articlebody')
```
The .select() method will return a collection of elements, which means that it would return the same results as the following .find_all() method example:
```
soup.find_all('div', id="articlebody")
# or
soup.find_all(id="articlebody")
```
If you only want to select a single element, then you could just use the .find() method:
```
soup.find('div', id="articlebody")
# or
soup.find(id="articlebody")
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
生来不讨喜

2020-11-30 20:26
```
soup.find("tagName",attrs={ "id" : "articlebody" })
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

别那么骄傲

2020-11-30 20:30

from bs4 import BeautifulSoup
from requests_html import HTMLSession

url = 'your_url'
session = HTMLSession()
resp = session.get(url)

# if element with id "articlebody" is dynamic, else need not to render
resp.html.render()

soup = bs(resp.html.html, "lxml")
soup.find("div", {"id": "articlebody"})

0 讨论(0)

后悔当初

2020-11-30 20:33
Happened to me also while trying to scrape Google.
I ended up using pyquery.
Install:
```
pip install pyquery
```
Use:
```
from pyquery import PyQuery    
pq = PyQuery('<html><body><div id="articlebody"> ... </div></body></html')
tag = pq('div#articlebody')
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
梦毁少年i

2020-11-30 20:35

have you tried soup.findAll("div", {"id": "articlebody"})?

sounds crazy, but if you're scraping stuff from the wild, you can't rule out multiple divs...

0 讨论(0)
发布评论:

提交评论
- 加载中...