PYTHON: How do I use BeautifulSoup to parse a table into a pandas dataframe

烈酒焚心 提交于 2021-02-05 08:00:47

问题


I am trying to scrape the CDC website for the data of the last 7 days reported cases for COVID-19. https://covid.cdc.gov/covid-data-tracker/#cases_casesinlast7days I've tried to find the table, by name, id, class, and it always returns as none type. When I print the data scraped, I cant manually locate the table in the html either. Not sure what I'm doing wrong here. Once the data is imported, I need to populate a pandas dataframe to later use for graphing purposes, and export the data table as a csv.


回答1:


You might as well request data from the API directly (check out Network tab in your browser while refreshing the page):

import requests
import pandas as pd


endpoint = "https://covid.cdc.gov/covid-data-tracker/COVIDData/getAjaxData"
data = requests.get(endpoint, params={"id": "US_MAP_DATA"}).json()
df = pd.DataFrame(data["US_MAP_DATA"])


EDIT: Trying to make this answer more general and useful.

How did you discern that this was how to parse the data?

Firstly, you need to inspect the page (Ctrl + Shift + I) and navigate to network tab:


Secondly, you need to refresh the page to record network activity.

Where to look?

Check XHR to limit number of records (1);

Look through the records by clicking on them (2) and check their preview responses (3) to find out if it's the data you need.


It doesn't always work but when it does, parsing data from API directly is so much easier than writing scrapers via requests / bs4 / selenium etc and should be the first choice.



来源:https://stackoverflow.com/questions/64406533/python-how-do-i-use-beautifulsoup-to-parse-a-table-into-a-pandas-dataframe

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!