I a trying to retrieve all the comments of a video via Python iteration/paging. I am logged correctly with a developer key
import googleapiclient.discovery
Issue is, we can't really retrieve all the comments of every video.
https://issuetracker.google.com/issues/134912604
We currently don't support paging through the whole stream. So there's no way to retrieve all the 1000+ commentThreads that you have for that video
According to Google's Python Client Library sample code and to Google's Youtube API sample code, you should have been coding your pagination loop as shown below:
request = yt.commentThreads().list(...)
while request:
response = request.execute()
# your processing code goes here ...
request = yt.commentThreads().list_next(request, response)
This is not a solution to your problem. It just shows that querying the endpoint via a GET request method succeeds obtaining from the API the needed page response.
# comments-wget [-d] VIDEO_ID [PAGE_TOKEN]
$ comments-wget() {
local x='eval'
[ "$1" == '-d' ] && {
x='echo'
shift
}
local v="$1"
quote2 -i v
local p="$2"
quote2 -i p
local O="/tmp/$v-comments%d.json"
local o
local k=0
while :; do
printf -v o "$O" "$k"
[ ! -f "$o" ] && break
(( k++ ))
done
quote o
k="$APP_KEY"
quote2 -i k
local a="$AGENT"
quote2 a
local c="\
wget \
--debug \
--verbose \
--no-check-certif \
--output-document=$o \
--user-agent=$a \
'https://www.googleapis.com/youtube/v3/commentThreads?key=$k&videoId=$v&part=replies,snippet&order=relevance&maxResults=100&textFormat=plainText&alt=json${p:+&pageToken=$p}'"
$x "$c"
}
$ PAGE_TOKEN=...
$ AGENT=... APP_KEY=... comments-wget CJ_GCPaKywg "$PAGE_TOKEN"
Setting --verbose (verbose) to 1
Setting --check-certificate (checkcertificate) to 0
Setting --output-document (outputdocument) to /tmp/CJ_GCPaKywg-comments0.json
Setting --user-agent (useragent) to ...
DEBUG output created by Wget 1.14 on linux-gnu.
--2019-06-10 17:41:11-- https://www.googleapis.com/youtube/v3/commentThreads?...
Resolving www.googleapis.com... 172.217.19.106, 216.58.214.202, 216.58.214.234, ...
Caching www.googleapis.com => 172.217.19.106 216.58.214.202 216.58.214.234 172.217.16.106 172.217.20.10 2a00:1450:400d:808::200a
Connecting to www.googleapis.com|172.217.19.106|:443... connected.
Created socket 5.
Releasing 0x0000000000ae57c0 (new refcount 1).
---request begin---
GET /youtube/v3/commentThreads?.../1.1
User-Agent: ...
Accept: */*
Host: www.googleapis.com
Connection: Keep-Alive
---request end---
HTTP request sent, awaiting response...
---response begin---
HTTP/1.1 200 OK
Expires: Mon, 10 Jun 2019 14:43:39 GMT
Date: Mon, 10 Jun 2019 14:43:39 GMT
Cache-Control: private, max-age=0, must-revalidate, no-transform
ETag: "XpPGQXPnxQJhLgs6enD_n8JR4Qk/OUAqOrEpA9YYqmVx0wqn9en_OrE"
Vary: Origin
Vary: X-Origin
Content-Type: application/json; charset=UTF-8
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Content-Length: 205965
Server: GSE
Alt-Svc: quic=":443"; ma=2592000; v="46,44,43,39"
---response end---
200 OK
Registered socket 5 for persistent reuse.
Length: 205965 (201K) [application/json]
Saving to: ‘/tmp/CJ_GCPaKywg-comments0.json’
100%[==========================================>] 205,965 580KB/s in 0.3s
2019-06-10 17:41:18 (580 KB/s) - ‘/tmp/CJ_GCPaKywg-comments0.json’ saved [205965/205965]
Note that the shell functions quote
and quote2
above are those from youtube-data.sh (they are not really needed). $PAGE_TOKEN
is extracted from the body
string of the JSON request object posted above.
The next question is: why your python code uses a POST request method? Could it be that this is the cause of your problem?