Crawling IMDB for movie trailers?

前提是你 提交于 2020-02-06 08:24:45

问题


I want to crawl IMDB and download the trailers of movies (either from YouTube or IMDB) that fit some criteria (e.g.: released this year, with a rating above 2).

I want to do this in Python - I saw that there were packages for crawling IMDB and downloading YouTube videos. The thing is, my current plan is to crawl IMDB and then search youtube for '$movie_name' + 'trailer' and hope that the top result is the trailer, and then download it.

Still, this seems a bit convoluted and I was wondering if there was perhaps an easier way.

Any help would be appreciated.


回答1:


There is no easier way. I doubt IMDB allows people to scrap their website freely so your IP is probably gonna get blacklisted and to counter that you'll need proxies. Good luck and scrape respectfully.




回答2:


The imdbpy API https://imdbpy.github.io/ will get you started, it's very straightforward.

  import imdb # pip install IMDbPY
  ia = imdb.IMDb()
  list_of_movies = ia.search_movie("string text")
  [ia.(m, info=['main','votes']) for m in list_of_movies[:1]]
  for m in list_of_movies[:1]:
    yt_search_term = m.get("name") + "trailer"
    # connect to YT API to start that part of the search.

Then lookup how to connect to the YTv3 API with appropriate authentication and download the corresponding Google client API - Sample code here

Issues: One challenge is that movie titles are not unique, so searching YT by name+" trailer" will not necessarily return your intended trailer. So you'll need to account for that somehow. For new hollywood blockbusters (and similar), you may be successful.

Legal: As indicated by the other answer, do verify your use case is in compliance with the terms and conditions and licenses of the technologies and information services that you are using. If in doubt seek the approval from those parties first or seek professional legal advice.



来源:https://stackoverflow.com/questions/49957297/crawling-imdb-for-movie-trailers

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!