Puppeteer

In puppeteer how to access exports from a script injected using 'page.addScriptTag'?

Deadly 提交于 2021-02-10 05:22:30
问题 In a puppeteer script I want to inject some utility functions to a webpage via page.addScriptTag and use some exports defined in the file. await page.addScriptTag({path: './utils/browser.js', type: 'module'}) await page.evaluate(() => { // how do I do `import { foo } from './utils/browser.js'` here? }) The injected file looks like this: export function foo() {} I'm using '"type": "module"' in my package.json if this is relevant. 回答1: You can try dynamic import() with 'data:' URLs: await page

Headless Google Chrome: How to prevent sites to know whether their window is focused or not

橙三吉。 提交于 2021-02-07 20:31:39
问题 Is there a way to prevent sites to know if they are visible or not? Perhaps a command line flag? I checked here but I could not find anything suitable https://peter.sh/experiments/chromium-command-line-switches/. I think they use the page visibility API: https://developer.mozilla.org/en-US/docs/Web/API/Page_Visibility_API 回答1: If your goal is to fool the visibility API, then inject this piece of script in the related page or frame: await page.evaluate(` Object.defineProperty(window.document,

puppeteer 无头模式反反爬设置汇总

流过昼夜 提交于 2021-02-07 16:28:31
点击上方“蓝字”关注我们 启动设置 const browser = await puppeteer.launch({ headless : true , args : [ '--no-sandbox' , '--disable-setuid-sandbox' , '--disable-blink-features=AutomationControlled' , ], dumpio : false , }); webdriver // webdriver await page.evaluateOnNewDocument( () => { const newProto = navigator.__proto__; delete newProto.webdriver; //删除 navigator.webdriver字段 navigator.__proto__ = newProto; }); window.chrome // 添加 window.chrome字段,向内部填充一些值 await page.evaluateOnNewDocument( () => { window .chrome = {}; window .chrome.app = { InstallState : 'hehe' , RunningState : 'haha' , getDetails : 'xixi' ,

Puppeteer in NodeJS reports 'Error: Node is either not visible or not an HTMLElement'

♀尐吖头ヾ 提交于 2021-02-07 04:52:40
问题 I'm using 'puppeteer' for NodeJS to test a specific website. It seems to work fine in most case, but some places it reports: Error: Node is either not visible or not an HTMLElement The following code picks a link that in both cases is off the screen. The first link works fine, while the second link fails. What is the difference? Both links are off the screen. Any help appreciated, Cheers, :) Example code const puppeteer = require('puppeteer'); const initialPage = 'https://statsregnskapet.dfo

Puppeteer in NodeJS reports 'Error: Node is either not visible or not an HTMLElement'

廉价感情. 提交于 2021-02-07 04:52:13
问题 I'm using 'puppeteer' for NodeJS to test a specific website. It seems to work fine in most case, but some places it reports: Error: Node is either not visible or not an HTMLElement The following code picks a link that in both cases is off the screen. The first link works fine, while the second link fails. What is the difference? Both links are off the screen. Any help appreciated, Cheers, :) Example code const puppeteer = require('puppeteer'); const initialPage = 'https://statsregnskapet.dfo

Get the Value of HTML Attributes Using Puppeteer

喜你入骨 提交于 2021-02-07 02:42:34
问题 Using Puppeteer, I've selected some HTML elements using: await page.$$( 'span.styleNumber' ); I can get the element's text using: console.log( await ( await styleNumber.getProperty( 'innerText' ) ).jsonValue() ); How can I the value of the element's data-Color attribute? Here is my script: HTML <span class="styleNumber" data-Color="Blue">SG1000</span> <span class="styleNumber" data-Color="Green">SG2000</span> <span class="styleNumber" data-Color="Red">SG3000</span> Puppeteer const puppeteer =

Get the Value of HTML Attributes Using Puppeteer

那年仲夏 提交于 2021-02-07 02:41:18
问题 Using Puppeteer, I've selected some HTML elements using: await page.$$( 'span.styleNumber' ); I can get the element's text using: console.log( await ( await styleNumber.getProperty( 'innerText' ) ).jsonValue() ); How can I the value of the element's data-Color attribute? Here is my script: HTML <span class="styleNumber" data-Color="Blue">SG1000</span> <span class="styleNumber" data-Color="Green">SG2000</span> <span class="styleNumber" data-Color="Red">SG3000</span> Puppeteer const puppeteer =

Puppeteer: how to download entire web page for offline use

断了今生、忘了曾经 提交于 2021-02-06 09:07:04
问题 How would I scrape an entire website, with all of its CSS/JavaScript/media intact (and not just its HTML), with Google's Puppeteer? After successfully trying it out on other scraping jobs, I would imagine it should be able to. However, looking through the many excellent examples online, there is no obvious method for doing so. The closest I have been able to find is calling html_contents = await page.content() and saving the results, but that saves a copy without any non-HTML elements. Is

Puppeteer: how to download entire web page for offline use

大城市里の小女人 提交于 2021-02-06 09:01:46
问题 How would I scrape an entire website, with all of its CSS/JavaScript/media intact (and not just its HTML), with Google's Puppeteer? After successfully trying it out on other scraping jobs, I would imagine it should be able to. However, looking through the many excellent examples online, there is no obvious method for doing so. The closest I have been able to find is calling html_contents = await page.content() and saving the results, but that saves a copy without any non-HTML elements. Is

Crawling multiple URL in a loop using puppeteer

对着背影说爱祢 提交于 2021-02-05 21:34:44
问题 I have urls = ['url','url','url'...] this is what I'm doing urls.map(async (url)=>{ await page.goto(url); await page.waitForNavigation({ waitUntil: 'networkidle' }); }) This seems to not wait for page load and visit all the urls quite rapidly(i even tried using page.waitFor ) just wanted to know am I doing something fundamentally wrong or this type of functionality is not advised/supported 回答1: map , forEach , reduce , etc, does not wait for the asynchronous operation within them, before they