问题
Trying to scrape this table off this website: https://www.investing.com/commodities/real-time-futures
But for some reason when I try to get the data, I keep getting an empty list.
This is what I'm doing to get the data and parse it:
componentDidMount() {
axios.get(`https://www.investing.com/commodities/real-time-futures`)
.then(response => {
if(response.status === 200)
{
const html = response.data;
const $ = cheerio.load(html);
let data = [];
$('#cross_rate_1 tr').each((i, elem) => {
data.push({
Month: $(elem).find('td#left noWrap').text()
})
});
console.log(data);
}
}, (error) => console.log('err') );
}
This is a screenshot of the particular part of the source code I'm trying to scrape.
Any help is much appreciated.
回答1:
As already mentioned, the table in question is constantly updating via a websocket connection. You can try getting the data by either 1) connecting to the websocket or 2) scraping the dynamically generated html.
Just for a data snapshot and not for a continuous time series, you can use a browser scraping extension. In this way you won't care about the websocket implementation.
I've identified the price data CSS selectors for you and created a scraping configuration to be used with the open source browser extension https://github.com/get-set-fetch/extension.
"eLtI4gnapZTLDsIgEEV/hejGLrC+F25N3OrCpUlD6FhIWmiY0f6+1Hd9EJsuSEguGRg4h8fSlS0Km/r3ZesjHR0g2zrtKzL2IYg1wOqLZ2hEicrSwxhFVOIyjquqGmpzAiRtsqG0RSxv5TVg7EDkvC7AD9etmqJlQBz9ONRW8HvgJ06UwD2HpCV/gtpFylFnC39A/s51A3qphMlg94ruBbtNCe5iMr5/EP/S3ICZf4H5myP/0tv3rSIm/oiQjBmlS0OKS6XzdDCJ9iYQT8PxLBzPw/Ei6rWwpZ0dZ2cMF5M="
Inside the extension do: new project > config hash > paste the above hash (without the quotes) > save, scrape, view results > export as csv.
Disclaimer: I'm the extension author.
来源:https://stackoverflow.com/questions/62905427/cheerio-axios-reactjs-to-web-scrape-a-table-off-a-webpage-returning-empty-list