问题
I'm a newbie to web-scrapping - I set up a loop to scrap with 37900 records. Due to the way the url/ server is being set up, there's a limit of 200 records displayed in each url. Each url ends with 'skip=200', or mulitiple of 200 to loop to the next url page where the next 200 records are displayed. Eventually I want to loop through all urls and append them as a table.
I created two loops shown as below - one for creating urls with skip= every 200 records, and another one to get response of each of these urls, then append them to a single dataframe.
However I run into an error on my last url and unable to append these jsons into a single dataframe.
"The query specified in the URI is not valid. Invalid value 'i37800' for $skip query option found. The $skip query option requires a non-negative integer value."
Edited: after removing the i after 'skip=' on my url, the second loop threw me this error
TypeError: 'list' object is not callable
When I pop this url https://~/Projects?&$skip=37800 the records are properly displayed, so I'm not sure why python threw me this error. Please see below my codes - Would appreciate any suggestions to fix this error and loops!
Thanks!
import pandas as pd
import requests
import json
records = range(37900)
skip = records[0::200]
Page = []
for i in skip:
endpoint = "https://~/Projects?&$skip=i{}".format(i)
Page.append(endpoint)
tbls = []
for j in Page():
response = session.get(j) #session here refers to requests.Session() I had to set up to authenticate my access to these urls
responsejs = response.json()
responsepd = pd.DataFrame(responsejs['value']) #I only want to extract header called 'value' in each json
tbls.append(responsepd)
来源:https://stackoverflow.com/questions/58494208/unable-to-loop-the-last-url-with-paging-limits