How do some sites download YouTube captions?

早过忘川 提交于 2020-01-10 09:04:12

问题


This is somewhat of a duplicate question of Does YouTube API forbid to download video captions if you are not it's owner?, Get YouTube captions and Does YouTube API forbid to download video captions if you are not it's owner?, which all basically say it's not possible unless to download captions via the YouTube API unless you are the owner or third-party contributions are not enabled; however, my question is how to sites like http://downsub.com/ or http://www.lilsubs.com/ have access to all captions?

In other words, when I access the YouTube API myself (even with youtubepartner and youtube.force-ssl scopes), I can only download the captions of some videos, but when I try the same videos that failed for me with 403: The permissions associated with the request are not sufficient to download the caption track. The request might not be properly authorized, or the video order might not have enabled third-party contributions for this caption. on these other sites, it works fine. I'm assuming they are using the YouTube API to access the captions, but what special sauce are they using? Some special partner key? An different API version? Are they just scraping from the videos themselves or something?


回答1:


Send a GET request on:

http://video.google.com/timedtext?lang={LANG}&v={VIDEOID}

Example for your video in comment: http://video.google.com/timedtext?lang=ko&v=0db1_qWZjRA

Let's look at another example of yours, i.e. https://www.youtube.com/watch?v=7068mw-6lmI (and I agree about differentiation part in your comment).

There are multiple subtitles available for the video

  • English
  • Korean
  • Spanish
  • Korean (auto-generated) also called asr (automatic speech recognition)

These stand for the subtitle name parameter (i.e., name=English).

lang stands for the country code. In your example: https://www.youtube.com/api/timedtext?lang=es-MX&v=7068mw-6lmI&name=Spanish

If subtitle track is available, it is possible to do translation form it, namely using tlang parameter.

https://www.youtube.com/api/timedtext?lang=en&v=7068mw-6lmI&name=English&tlang=lv
https://www.youtube.com/api/timedtext?lang=ko&v=7068mw-6lmI&name=Korean&tlang=lv

This would be my bid for what these sites are using, i.e. translation of the available subtitle track (confirm by trying to use a video without subtitle track as input for one of their sites).

As for asr signature seems to always be needed, but as long as one of the subtitle tracks are available, you could use that for translation. E.g. in your OP comment example:

https://www.youtube.com/api/timedtext?lang=en&v=vx6NCUyg1NE&tlang=lv

Looks like the last example is special with both of subtitle tracks being asr (checked with Chrome -> Inspect -> Network) therefore you need to omit the subtitle name parameter part. This difference unfortunately is not visible in YouTube video's settings wheel.




回答2:


There is this unofficial API used by Youtube :

https://www.youtube.com/api/timedtext?lang={LANG}&v={VIDEO_ID}

LANG here is ISO 639-1 2 letter country code. For your example it would be :

https://www.youtube.com/api/timedtext?lang=ko&v=0db1_qWZjRA

You can check it in network tab while toggling the closed caption button :



来源:https://stackoverflow.com/questions/46864428/how-do-some-sites-download-youtube-captions

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!