extract iFrame content using BeautifulSoup

半城伤御伤魂 提交于 2021-01-25 06:50:57


On the page bellow --> link, I'm trying to use BeautifulSoup in order to extract the <a> texts at the very bottom, i.e., 'Private Life' and 'Lost Boy'.

But I'm having a hard time scraping <iframe> content.

I've learned that it requires a different request from the browser.

So I've tried:

iframexx = soup.find_all('iframe')
for iframe in iframexx:
        response = urllib2.urlopen(iframe)
        results = BeautifulSoup(response)
        print results

but that returns None.

how do I parse the html bellow so I can fetch each a['href'].get_text()?


Browsers will load the iframe content in a separate request, so you'll need to fetch the url that is present in the iframe src. You can use selenium if you want, or scrape the data itself directly. Here is an example:

import requests
import re

url = 'https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/310079005&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false'

response = requests.get(url)

Artist = re.search(b'(?<=artist":")(.*?)(?=")', response.content).group(0).decode("utf-8")
Song = re.search(b'(?<=title":")(.*?)(?=")', response.content).group(0).decode("utf-8")

print ("%s - %s" % (Artist, Song))

Private Life - Lost Boy

