Scrape yt formatted strings with beautiful soup

寵の児 提交于 2020-07-22 21:34:33

问题


I've tried to scrape yt-formatted strings with BeautifulSoup, but it always gives me an error. Here is my code:

import requests
import bs4
from bs4 import BeautifulSoup

r = requests.get('https://www.youtube.com/channel/UCPyMcv4yIDfETZXoJms1XFA')
soup = bs4.BeautifulSoup(r.text, "html.parser")
def onoroff():
    onoroff = soup.find('yt-formatted-string',{'id','subscriber-count'}).text
    return onoroff


print("Subscribers:  "+str(onoroff().strip()))

This is the error I get

AttributeError: 'NoneType' object has no attribute 'text'

Is there another way to scrape yt-formatted-strings?


回答1:


Most of Youtube content is generated via JavaScript, capability that BeautifulSoup don't have, but you can get luck by scrapping the json objects on the source code, but not the HTML elements directly, i.e.:

import requests, json, re

h = {
    'Host': 'www.youtube.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0',
    'Accept': '*/*',
    'Accept-Language': 'en-US,pt;q=0.7,en;q=0.3',
    'Referer': 'https://www.youtube.com/channel/UCPyMcv4yIDfETZXoJms1XFA',
}
u = "https://www.youtube.com/channel/UCPyMcv4yIDfETZXoJms1XFA"
html = requests.get(u, headers=h).text

# lets get the json object that contains all the info we need from the source code and convert it into a python dict that we can use later
matches = re.findall(r'window\["ytInitialData"\] = (.*\}\]\}\}\});', html, re.IGNORECASE | re.DOTALL)
if matches:
    j = json.loads(matches[0])
    # browse the json object and search the info you need : https://jsoneditoronline.org/#left=cloud.123ad9bb8bbe498c95f291c32962aad2
    # We are now ready to get the the number of subscribers (among other info):

    subscribers = j['header']['c4TabbedHeaderRenderer']['subscriberCountText']['runs'][0]["text"]
    print(subscribers)
    # 110 subscribers

Demo



来源:https://stackoverflow.com/questions/61427391/scrape-yt-formatted-strings-with-beautiful-soup

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!