I have started using v3 of the YouTube apis on an android device, using the java client library. Some videos that I am interested in have transcripts that I can access on th
With API v3 you can first grab the available transcripts with the snippet:
https://www.googleapis.com/youtube/v3/captions?videoId=U1e2VNtEqm4&part=snippet&key=(my_api_key):
{
"kind": "youtube#captionListResponse",
"etag": "\"DsOZ7qVJA4mxdTxZeNzis6uE6ck/aGHflncRxq1Uz6m1akhrOLUWUqU\"",
"items": [
{
"kind": "youtube#caption",
"etag": "\"DsOZ7qVJA4mxdTxZeNzis6uE6ck/IC7rNKkn3SQNdovFwR6fEabUYnY\"",
"id": "TqXDnlamg84o4bX0q2oaHz4nfWZdyiZMOrcuWsSLyPc=",
"snippet": {
"videoId": "U1e2VNtEqm4",
"lastUpdated": "2016-01-25T21:50:27.142Z",
"trackKind": "standard",
"language": "en-GB",
"name": "",
"audioTrackType": "unknown",
"isCC": false,
"isLarge": false,
"isEasyReader": false,
"isDraft": false,
"isAutoSynced": false,
"status": "serving"
}
},
{
"kind": "youtube#caption",
"etag": "\"DsOZ7qVJA4mxdTxZeNzis6uE6ck/5UP1qPkmq6mzTUaEVnFC8WqjFgU\"",
"id": "TqXDnlamg84o4bX0q2oaHw_Y53ilUWv6vMFbk0RL3XY=",
"snippet": {
"videoId": "U1e2VNtEqm4",
"lastUpdated": "2016-01-25T21:55:07.481Z",
"trackKind": "standard",
"language": "en-US",
"name": "",
"audioTrackType": "unknown",
"isCC": false,
"isLarge": false,
"isEasyReader": false,
"isDraft": false,
"isAutoSynced": false,
"status": "serving"
}
}
]
}
And then pick the transcript you want:
https://www.googleapis.com/youtube/v3/captions/id?id=TqXDnlamg84o4bX0q2oaHz4nfWZdyiZMOrcuWsSLyPc=
or
https://www.googleapis.com/youtube/v3/captions/TqXDnlamg84o4bX0q2oaHz4nfWZdyiZMOrcuWsSLyPc=
at which point you need provide an authorization key. Apparently a simple key isn't enough. Possibly because:
Quota impact: A call to this method has a quota cost of approximately 200 units.
Note the slight difference in the URLs (/caption/
versus /caption?
).
All the lovely documentation is here: https://developers.google.com/youtube/v3/docs/captions
I may be wrong, but I don't think there is yet a documented way to get the caption track via v3 of the API. If you're authenticating with oAuth2, however, your authentication will also be good for v2 of the API, so you could do a quick call to this feed:
http://gdata.youtube.com/feeds/api/videos/[VIDEOID]/captiondata/[CAPTION TRACKID]
to get the data you want. To retrieve a list of possible caption track IDs with v2 of the API, you access this feed:
https://gdata.youtube.com/feeds/api/videos/[VIDEOID]/captions
That feed request also accepts some optional parameters, including language, max-results, etc. For more details, along with a sample that shows the returned format of the caption track list, see the documentation at https://developers.google.com/youtube/2.0/developers_guide_protocol_captions#Retrieve_Caption_Set
I had the same problem with this... and spent like a week looking for a solution until I hit this:
https://stackoverflow.com/questions/10036796/how-to-extract-subtitles-from-youtube-videos
Just do a GET request on: http://video.google.com/timedtext?lang={LANG}&v={VIDEOID} You don't need any api/oauth/etc. to access this.
Heres some code I wrote which grabs all the caption tracks from any youtube video without having to use the API. Just plug the video URL in the $video_url
variable.
// get video id from url
$video_url = 'https://www.youtube.com/watch?v=kYX87kkyubk';
preg_match("#(?<=v=)[a-zA-Z0-9-]+(?=&)|(?<=v\/)[^&\n]+(?=\?)|(?<=v=)[^&\n]+|(?<=youtu.be/)[^&\n]+#", $video_url, $matches);
// get video info from id
$video_id = $matches[0];
$video_info = file_get_contents('http://www.youtube.com/get_video_info?&video_id='.$video_id);
parse_str($video_info, $video_info_array);
if (isset($video_info_array['caption_tracks'])) {
$tracks = explode(',', $video_info_array['caption_tracks']);
// print info for each track (including url to track content)
foreach ($tracks as $track) {
parse_str($track, $output);
print_r($output);
}
}