问题
I want to Scrape Google Search Result Description Using BeautifulSoup but I am not able to scrape the tag which is containing the description.
Ancestor:
html
body#gsr.srp.vasq.wf-b
div#main
div#cnt.big
div.mw
div#rcnt
div.col
div#center_col
div#res.med
div#search
div
div#rso
div.g
div.rc
div.IsZvec
div
span.aCOpRe
Children
em
Python Code:
from bs4 import BeautifulSoup
import requests
import bs4.builder._lxml
import re
search = input("Enter the search term:")
param = {"q": search}
r = requests.get("https://google.com/search?q=", params = param)
soup = BeautifulSoup(r.content, "lxml")
soup.prettify()
title = soup.findAll("div",class_ = "BNeawe vvjwJb AP7Wnd")
for t in title:
print(t.get_text())
description = soup.findAll("span", class_ = "aCOpRe")
for d in description:
print(d.get_text())
print("\n")
link = soup.findAll("a")
for link in soup.find_all("a",href=re.compile("(?<=/url\?q=)(htt.*://.*)")):
print(re.split(":(?=http)",link["href"].replace("/url?q=","")))
Image Link displaying the tag
回答1:
You might want to try the CSS
selector and then just pull the text out.
For example:
import requests
from bs4 import BeautifulSoup
page = requests.get("https://www.google.com/search?q=scrap").text
soup = BeautifulSoup(page, "html.parser").select(".s3v9rd.AP7Wnd")
for item in soup:
print(item.getText(strip=True))
Sample output for scrap
:
discard or remove from service (a redundant, old, or inoperative vehicle, vessel, or machine), especially so as to convert it to scrap metal.
回答2:
The proper CSS selector for snippets (descriptions) of Google Search results is .aCOpRe span:not(.f)
.
Here's a full example in online IDE.
from bs4 import BeautifulSoup
import requests
import re
param = {"q": "coffee"}
headers = {
"User-Agent":
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.5 Safari/605.1.15"
}
r = requests.get("https://google.com/search", params=param, headers=headers)
soup = BeautifulSoup(r.content, "lxml")
soup.prettify()
title = soup.select(".DKV0Md span")
for t in title:
print(f"Title: {t.get_text()}\n")
snippets = soup.select(".aCOpRe span:not(.f)")
for d in snippets:
print(f"Snippet: {d.get_text()}\n")
link = soup.findAll("a")
for link in soup.find_all("a", href=re.compile("(?<=/url\?q=)(htt.*://.*)")):
print(re.split(":(?=http)", link["href"].replace("/url?q=", "")))
Output
Title: Coffee - Wikipedia
Title: Coffee: Benefits, nutrition, and risks - Medical News Today
...
Snippet: Coffee is a brewed drink prepared from roasted coffee beans, the seeds of berries from certain Coffea species. When coffee berries turn from green to bright red in color – indicating ripeness – they are picked, processed, and dried.
Snippet: When people think of coffee, they usually think of its ability to provide an energy boost. ... This article looks at the health benefits of drinking coffee, the evidence ...
...
Alternatively, you can extract data from Google Search via SerpApi.
curl
example
curl -s 'https://serpapi.com/search?q=coffee&location=Sweden&google_domain=google.se&gl=se&hl=sv&num=100'
Python example
from serpapi import GoogleSearch
import os
params = {
"engine": "google",
"q": "coffee",
"location": "Sweden",
"google_domain": "google.se",
"gl": "se",
"hl": "sv",
"num": 100,
"api_key": os.getenv("API_KEY")
}
client = GoogleSearch(params)
data = client.get_dict()
print("Organic results")
for result in data['organic_results']:
print(f"""
Title: {result['title']}
Link: {result['link']}
Position: {result['position']}
Snippet: {result['snippet']}
""")
Output
Organic results
Title: Coffee - Wikipedia
Link: https://en.wikipedia.org/wiki/Coffee
Position: 1
Snippet: Coffee is a brewed drink prepared from roasted coffee beans, the seeds of berries from certain Coffea species. When coffee berries turn from green to bright red ...
Title: Drop Coffee
Link: https://www.dropcoffee.com/
Position: 2
Snippet: Drop Coffee is an award winning roastery in Stockholm, representing Sweden four times in the World Coffee Roasting Championship, placing second, third and ...
...
Disclaimer: I work at SerpApi.
来源:https://stackoverflow.com/questions/64880683/scrape-google-search-result-description-using-beautifulsoup