问题
I want to crawl IMDB and download the trailers of movies (either from YouTube or IMDB) that fit some criteria (e.g.: released this year, with a rating above 2).
I want to do this in Python - I saw that there were packages for crawling IMDB and downloading YouTube videos. The thing is, my current plan is to crawl IMDB and then search youtube for '$movie_name' + 'trailer' and hope that the top result is the trailer, and then download it.
Still, this seems a bit convoluted and I was wondering if there was perhaps an easier way.
Any help would be appreciated.
回答1:
There is no easier way. I doubt IMDB allows people to scrap their website freely so your IP is probably gonna get blacklisted and to counter that you'll need proxies. Good luck and scrape respectfully.
回答2:
The imdbpy
API https://imdbpy.github.io/ will get you started, it's very straightforward.
import imdb # pip install IMDbPY
ia = imdb.IMDb()
list_of_movies = ia.search_movie("string text")
[ia.(m, info=['main','votes']) for m in list_of_movies[:1]]
for m in list_of_movies[:1]:
yt_search_term = m.get("name") + "trailer"
# connect to YT API to start that part of the search.
Then lookup how to connect to the YTv3 API with appropriate authentication and download the corresponding Google client API - Sample code here
Issues: One challenge is that movie titles are not unique, so searching YT by name+" trailer"
will not necessarily return your intended trailer. So you'll need to account for that somehow. For new hollywood blockbusters (and similar), you may be successful.
Legal: As indicated by the other answer, do verify your use case is in compliance with the terms and conditions and licenses of the technologies and information services that you are using. If in doubt seek the approval from those parties first or seek professional legal advice.
来源:https://stackoverflow.com/questions/49957297/crawling-imdb-for-movie-trailers