Scraping YouTube links from a webpage

耗尽温柔 提交于 2020-08-20 07:46:33

问题


I've been trying to scrape YouTube links from a webpage, but nothing has worked. This is a picture of what I've been trying to scrape.:

This is the code I tried most recently:

youtube_link = soup.find("a", class_="ytp-title-link yt-uix-sessionlink")

And this is the link to the website the YouTube link is in: https://www.electronic-festivals.com/event/i-am-hardstyle-germany


回答1:


Most of the youtube links are within an iframe and javascript also needs to run. Try using selenium. The following extracts any src or href containing youtube. I only enter the key iframe hosting the youtube clip. You could loop all iframes checking.

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

def addItems(links, final):
    for link in links:
        ref = link.get_attribute('src') if link.get_attribute('src') is not None else link.get_attribute('href')
        final.append(ref)
    return final

url = "https://www.electronic-festivals.com/event/i-am-hardstyle-germany" 
driver = webdriver.Chrome()
driver.get(url)
driver.switch_to.frame(driver.find_element_by_css_selector('.media-youtube-player'))
final = []

try:
    links = WebDriverWait(driver, 10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, "[href*=youtube] , [src*=youtube]")))
    addItems(links, final)
except:
    pass
finally:
    driver.switch_to.default_content()

links = driver.find_elements_by_css_selector('[href*=youtube] , [src*=youtube]')
addItems(links, final)

for link in set(final):
    print(link)

driver.quit()



回答2:


If you mean by scraping downloading, try

pip install youtube-dl

in your shell.



来源:https://stackoverflow.com/questions/54973419/scraping-youtube-links-from-a-webpage

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!