Is it possible to open a local HTML file with headless Chrome using Puppeteer (without a web server)? I could only get it to work against a local server.
I found
I just did a test locally (you can see I did this on windows) and puppeteer happily opened my local html file using page.goto and a full file url, and saved it as a pdf:
'use strict';
const puppeteer = require('puppeteer');
(async() => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto('file://C:/Users/compoundeye/test.html');
await page.pdf({
path: 'test.pdf',
format: 'A4',
margin: {
top: "20px",
left: "20px",
right: "20px",
bottom: "20px"
}
});
await browser.close();
})();
If you need to use a relative path might want to look at this question about the use of relative file paths: File Uri Scheme and Relative Files
You can use file-url
to prepare the URL to pass to page.goto
:
const fileUrl = require('file-url');
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(fileUrl('file.html'));
await browser.close();
I open the file I wanted to load into the browser and copied the URL to make sure all the \'s where correct.
await page.goto(`file:///C:/pup_scrapper/testpage/TM.html`);
import puppeteer from 'puppeteer';
(async () => {
const browser = await puppeteer.launch();
const page = await browser.newPage();
// __dirname is a global node variable that corresponds to the absolute
// path of the folder containing the currently executing file
await page.goto(`file://${__dirname}/pages/test.html`);
const element = await page.$('.myElement');
if (element) {
await elementHandle.screenshot({
path: `./out/screenshot.png`,
omitBackground: true,
});
}
await browser.close();
})();
If file is on local, using setContent will be better than goto
var contentHtml = fs.readFileSync('C:/Users/compoundeye/test.html', 'utf8');
await page.setContent(contentHtml);
You can check performance between setContent and goto at here
Why not open the HTML file read the content, then "setContent"