Puppeteer

Puppeteer returning empty object

﹥>﹥吖頭↗ 提交于 2021-02-05 07:26:48
问题 When I run the following code in the page console I'm trying to scrape, I got picture. document.querySelector('#sb-site > div.sticky_footer > div:nth-child(9)') However, when I run this in my program, the console log it and returns '{}' const inputContent = await page.evaluate(() => { return document.querySelector('#sb-site > div.sticky_footer > div:nth-child(9)'); }); 回答1: puppeteer can transfer two types of data between Node.js and browser context: serializable data (i.e. data that is

How do print the console output of the page in puppeter as it would appear in the browser?

蹲街弑〆低调 提交于 2021-02-04 07:12:30
问题 I keep seeing this WRONG CODE page.on('console', msg => console.log(msg.text())); That FAILS console.log('Hello %s', 'World'); produces Hello World // browser Hello %s World // puppeteer Ok, So I thought maybe I could do this page.on('console', msg => console.log(...msg.args())); NOPE: That dumps out some giant JSHandle thing. Ok, So maybe page.on('console', msg => console.log(...msg.args().map(a => a.toString()); NOPE: That prints JSHandle: Hello %s JSHandle: World I suppose I can hack it by

How do print the console output of the page in puppeter as it would appear in the browser?

半城伤御伤魂 提交于 2021-02-04 07:10:12
问题 I keep seeing this WRONG CODE page.on('console', msg => console.log(msg.text())); That FAILS console.log('Hello %s', 'World'); produces Hello World // browser Hello %s World // puppeteer Ok, So I thought maybe I could do this page.on('console', msg => console.log(...msg.args())); NOPE: That dumps out some giant JSHandle thing. Ok, So maybe page.on('console', msg => console.log(...msg.args().map(a => a.toString()); NOPE: That prints JSHandle: Hello %s JSHandle: World I suppose I can hack it by

Puppeteer evaluate function

隐身守侯 提交于 2021-02-02 08:38:35
问题 I'm new to pupetteer and I'm trying to understand how it's actually working through some examples: So basically what I'm trying to do in this example is to extract number of views of a Youtube video. I've written a js line on the Chrome console that let me extract this information: document.querySelector('#count > yt-view-count-renderer > span.view-count.style-scope.yt-view-count-renderer').innerText Which worked well. However when I did the same with my pupetteer code he doesn't recognize

Puppeteer evaluate function

狂风中的少年 提交于 2021-02-02 08:37:05
问题 I'm new to pupetteer and I'm trying to understand how it's actually working through some examples: So basically what I'm trying to do in this example is to extract number of views of a Youtube video. I've written a js line on the Chrome console that let me extract this information: document.querySelector('#count > yt-view-count-renderer > span.view-count.style-scope.yt-view-count-renderer').innerText Which worked well. However when I did the same with my pupetteer code he doesn't recognize

Puppeteer press enter on print dialog screen has no effect

放肆的年华 提交于 2021-01-29 20:50:56
问题 I have a puppeteer script that loops through a list of pages to print as pdfs (not via puppeteer page.pdf) but with the print preview dialog. when it reaches here i have changed the title of the page so that the file will be named accordingly. When headless mode is false I see it stop at the print preview dialog in which i can just press enter, and enter again for the location to save the page and thats fine. So in code I use 'await page.keyboard.press('Enter');' twice but they don't work.

Puppeteer can't access HTTPS site with proxy server

女生的网名这么多〃 提交于 2021-01-29 15:42:59
问题 Here's my Nodejs code in which I'm trying to access a https site using https proxy but it doesn't seem to work, meanwhile the http proxy works fine. I have researched but nothing worked. const puppeteer = require("puppeteer-extra"); const useProxy = require("puppeteer-page-proxy"); const StealthPlugin = require("puppeteer-extra-plugin-stealth"); const AdblockerPlugin = require("puppeteer-extra-plugin-adblocker"); puppeteer.use(StealthPlugin()); puppeteer.use(AdblockerPlugin({ blockTrackers:

Can't get the fully loaded html for a page using puppeteer

可紊 提交于 2021-01-29 11:38:43
问题 I'm trying to get the full html for this page. It has a spreadsheet that loads slowly. I'm able to get the spreadsheet included when taking a screenshot of the page. However I can't get the html for the spreadsheet. document.body.outerHTML excludes the html for the spreadsheet. It's as if puppeteer is still seeing the page before the spreadsheet loads. How do I get the fully loaded HTML including the HTML for the spreadsheet? (async () => { const browser = await puppeteer.launch(); const page

How to preserve @page margin but with a sidebar

允我心安 提交于 2021-01-29 10:35:19
问题 I've been trying to generate a dynamic invoice using Handlebars and Puppeteer. Now the problem is that I need to display a 'sidebar' so to speak in my invoice template. However, I also need @page margin so that overflowing content gets wrapped to a new page with enough margin. See image below for clarification: I set up a github repo with example code you can check out here (instructions to run project in read.me). How can I preserve the margin added with @page , but also have a sidebar that

Pagination when there is no “next page” button but bunch of “page numbers” pages

别等时光非礼了梦想. 提交于 2021-01-29 09:24:18
问题 I was happy doing my scrapping with R but found its limits. Trying to scrap the summary of cases of Argentina's Supreme Court, I found a problem for which I cannot find an answer. It is likely the outcome of learning by doing --- so please, do point out where my code works but is following a rather bad practice. Anyway, I managed to: Access the search page. Entry a relevant taxonomy term (e.g. 'DECRETO DE NECESIDAD Y URGENCIA') in #voces , click search and scrap the .datosSumarios , where