问题
I was wondering what is the difference between performing bs.find('div')
and bs.select_one('div')
. Same goes for find_all
and select
.
Is there any difference performance wise, or if any is better to use over the other in specific cases.
回答1:
select()
and select_one()
give you a different way navigating through an HTML tree using the CSS selectors which has rich and convenient syntax. Though, the CSS selector syntax support in BeautifulSoup
is limited but covers most common cases.
Performance-wise, it really depends on an HTML tree to parse and on which element, how deep is it and what selector is used to locate it. Plus, what find()
+ find_all()
alternative there is to compare the select()
with, is also important. In a simple case like bs.find('div')
vs bs.select_one('div')
, I'd say that, generally, find()
should perform faster simply because there is a lot going on to support CSS selector syntax under-the-hood.
回答2:
select_one is normally much faster than find:
In [13]: req = requests.get("https://httpbin.org/")
In [14]: soup = BeautifulSoup(req.content, "html.parser")
In [15]: soup.select_one("#DESCRIPTION")
Out[15]: <h2 id="DESCRIPTION">DESCRIPTION</h2>
In [16]: soup.find("h2", id="DESCRIPTION")
Out[16]: <h2 id="DESCRIPTION">DESCRIPTION</h2>
In [17]: timeit soup.find("h2", id="DESCRIPTION")
100 loops, best of 3: 5.27 ms per loop
In [18]: timeit soup.select_one("#DESCRIPTION")
1000 loops, best of 3: 649 µs per loop
In [19]: timeit soup.select_one("div")
10000 loops, best of 3: 61 µs per loop
In [20]: timeit soup.find("div")
1000 loops, best of 3: 446 µs per loop
find basically is just the same as using find_all setting the limit to 1, then checking if the list returned is empty or not, indexing, if it is not empty or returning None if it is.
def find(self, name=None, attrs={}, recursive=True, text=None,
**kwargs):
"""Return only the first child of this Tag matching the given
criteria."""
r = None
l = self.find_all(name, attrs, recursive, text, 1, **kwargs)
if l:
r = l[0]
return r
select_one does something similar using select:
def select_one(self, selector):
"""Perform a CSS selection operation on the current element."""
value = self.select(selector, limit=1)
if value:
return value[0]
return None
The cost is much lower with the select without all the keyword args to process.
Beautifulsoup : Is there a difference between .find() and .select() - python 3.xx covers a bit more on the differences.
来源:https://stackoverflow.com/questions/39033612/bs4-select-one-vs-find