问题
On the page bellow --> link, I'm trying to use BeautifulSoup
in order to extract the <a>
texts at the very bottom, i.e., 'Private Life'
and 'Lost Boy'
.
But I'm having a hard time scraping <iframe>
content.
I've learned that it requires a different request from the browser.
So I've tried:
iframexx = soup.find_all('iframe')
for iframe in iframexx:
try:
response = urllib2.urlopen(iframe)
results = BeautifulSoup(response)
print results
but that returns None
.
how do I parse the html bellow so I can fetch each a['href'].get_text()
?
回答1:
Browsers will load the iframe content in a separate request, so you'll need to fetch the url that is present in the iframe src
. You can use selenium if you want, or scrape the data itself directly.
Here is an example:
import requests
import re
url = 'https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/310079005&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false'
response = requests.get(url)
Artist = re.search(b'(?<=artist":")(.*?)(?=")', response.content).group(0).decode("utf-8")
Song = re.search(b'(?<=title":")(.*?)(?=")', response.content).group(0).decode("utf-8")
print ("%s - %s" % (Artist, Song))
Private Life - Lost Boy
来源:https://stackoverflow.com/questions/42589907/extract-iframe-content-using-beautifulsoup