extract iFrame content using BeautifulSoup

半城伤御伤魂 提交于 2021-01-25 06:50:57

问题


On the page bellow --> link, I'm trying to use BeautifulSoup in order to extract the <a> texts at the very bottom, i.e., 'Private Life' and 'Lost Boy'.

But I'm having a hard time scraping <iframe> content.

I've learned that it requires a different request from the browser.

So I've tried:

iframexx = soup.find_all('iframe')
for iframe in iframexx:
    try:
        response = urllib2.urlopen(iframe)
        results = BeautifulSoup(response)
        print results

but that returns None.

how do I parse the html bellow so I can fetch each a['href'].get_text()?


回答1:


Browsers will load the iframe content in a separate request, so you'll need to fetch the url that is present in the iframe src. You can use selenium if you want, or scrape the data itself directly. Here is an example:

import requests
import re

url = 'https://w.soundcloud.com/player/?url=https%3A//api.soundcloud.com/tracks/310079005&color=ff5500&auto_play=false&hide_related=false&show_comments=true&show_user=true&show_reposts=false'

response = requests.get(url)

Artist = re.search(b'(?<=artist":")(.*?)(?=")', response.content).group(0).decode("utf-8")
Song = re.search(b'(?<=title":")(.*?)(?=")', response.content).group(0).decode("utf-8")

print ("%s - %s" % (Artist, Song))

Private Life - Lost Boy



来源:https://stackoverflow.com/questions/42589907/extract-iframe-content-using-beautifulsoup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!