问题
I'm trying to get the full html for this page. It has a spreadsheet that loads slowly. I'm able to get the spreadsheet included when taking a screenshot of the page. However I can't get the html for the spreadsheet. document.body.outerHTML
excludes the html for the spreadsheet. It's as if puppeteer is still seeing the page before the spreadsheet loads.
How do I get the fully loaded HTML including the HTML for the spreadsheet?
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("http://www.electproject.org/2016g", {
timeout: 11000,
waitUntil: "networkidle0",
});
await page.setViewport({
width: 640,
height: 880,
deviceScaleFactor: 1,
});
await page.screenshot({ path: "buddy-screenshot.png", format: "A4" }); // this screenshot displays the spreadsheet
let html = await page.evaluate(() => document.body.outerHTML); // this returns the html excluding the spreadsheet
await browser.close();
})();
回答1:
The spreadsheet is in an iframe, so you need to get the iframe first:
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto("http://www.electproject.org/2016g", {
timeout: 11000,
waitUntil: "networkidle0",
});
await page.setViewport({
width: 640,
height: 880,
deviceScaleFactor: 1,
});
const spreadsheetFrame = page.frames().find(
frame => frame.url().startsWith('https://docs.google.com/spreadsheets/')
);
let spreadsheetHead = await spreadsheetFrame.evaluate(
() => document.body.querySelector('#top-bar').innerText
);
console.log(spreadsheetHead); // 2016 November General Election : Turnout Rates
await browser.close();
})();
来源:https://stackoverflow.com/questions/64297728/cant-get-the-fully-loaded-html-for-a-page-using-puppeteer