问题
I want to iterate and extract table from the link here, then save as excel file.
How can I do that? Thank you.
My code so far:
import pandas as pd
import requests
from bs4 import BeautifulSoup
from tabulate import tabulate
url = 'http://zjj.sz.gov.cn/ztfw/gcjs/xmxx/jgysba/'
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
print(soup)
New update:
from requests import post
import json
import pandas as pd
import numpy as np
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36",
"Referer": "http://zjj.sz.gov.cn/projreg/public/jgys/jgysList.jsp"}
dfs = []
#dfs = pd.DataFrame()
for page in range(0, 10):
data = {"limit": 100, "offset": page * 100, "pageNumber": page + 1}
json_arr = requests.post("http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json", headers = headers, data = data).text
d = json.loads(json_arr)
df = pd.read_json(json.dumps(d['rows']) , orient='list')
dfs.append(df)
print(dfs)
dfs = pd.concat(dfs)
#https://stackoverflow.com/questions/57842073/pandas-how-to-drop-rows-when-all-float-columns-are-nan
dfs = dfs.loc[:, ~dfs.replace(0, np.nan).isna().all()]
dfs.to_excel('test.xlsx', index = False)
It generates 10
pages and 1000
rows, but some columns values are misplaced, someone knows where did I do wrong? Thank you.
回答1:
So, using the JSON
API from XHR you make a simple python post
request via requests
and you have your data.
In the params you have two of them which you can change to get different volumes of data, limit
is the nos of objects you get in a request. pageNumber
is the paginated page counter.
from requests import post
import json
url = 'http://zjj.sz.gov.cn/projreg/public/jgys/webService/getJgysLogList.json'
data = { 'limit' : '100', 'pageNumber' : '1'}
response = post(url, data=d)
response.text
Further you can use pandas
to create a data frame or create a excel as you want.
来源:https://stackoverflow.com/questions/59835094/iterate-and-extract-tables-from-web-saving-as-excel-file-in-python