google-chrome-headless | 易学教程

Selenium Chrome Driver Limitations Web Scraping at Scale

阅读更多关于 Selenium Chrome Driver Limitations Web Scraping at Scale

问题 I'm planning to use Selenium Chrome Driver for my project which will be used to do web scraping to multiple public websites (something like kayak or skyscanner). So there will be a REST GET endpoint where my backend would launch headless Chrome to scrape multiple websites, and eventually return a manipulated JSON. I want to know how scalable is Chrome Driver as it sounds like a headless Chrome instance needs to be launched whenever a request comes in. Updated: Question using Google Chrome

Capybara Selenium Chrome opens About Google Chrome

阅读更多关于 Capybara Selenium Chrome opens About Google Chrome

问题 I have an issue with testing in Chrome. When I run the test it opens the chrome://settings/help page in a new tab. This causes my tests to fail as it can't find the buttons it should click on. I was debugging in Chrome in normal mode when I saw what happend. Can I prevent this from happening? Or can I keep the tab where I'm testing in focussed in some way? 回答1: I ran into the same issue this morning. All our tests were failing because chrome://settings/help was automatically opened when we

how to hide margins in headless chrome generated pdf?

阅读更多关于 how to hide margins in headless chrome generated pdf?

问题 I'm using headless chrome to generate a long pdf document with Python/Django. Is there a way to remove header with date and footer with url and pages count from pages? Tried to use @page{ margin: 0; size: auto; } but with this css there are no margins, which i need. tried to wrap page content with div.wrapper and style .wrapper{ margin: 15mm 10mm 15mm 15mm; } but with this solution there are top and bottom margins only on first and last pages. Pages between are without vertical margins and

How to dump more than <body> on chrome / chromium headless?

阅读更多关于 How to dump more than on chrome / chromium headless?

问题 Chrome's documentation states: The --dump-dom flag prints document.body.innerHTML to stdout: As per the title, how can more of the DOM object (ideally all) be dumped with Chromium headless? I can manually save the entire DOM via the developer tools, but I want a programmatic solution. 回答1: Update 2019-04-23 Google was very active on headless front and many updates happened The answer below is valid for the v62 current version is v73 and it's updating all the time. https://www.chromestatus.com

how do POST request in puppeteer?

阅读更多关于 how do POST request in puppeteer?

问题 (async() => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://www.example.com/search'); const data = await page.content(); browser.close(); res.send(data); })(); I do this code for send get request. I don't understand how I should send post request? 回答1: Getting the "order" right can be a bit of a challenge. Documentation doesn't have that many examples... there are some juicy items in the repository in the example folder that you

Node JS Puppteer Infinite scroll loop

阅读更多关于 Node JS Puppteer Infinite scroll loop

问题 I am learning Puppeteer & trying to scrape a website that has infinite scroll implemented. I am able to get all the Prices from the list, by scrolling down after a delay of 1 second. Here is the URL What I want to do is, open a item from the list, get the product name, go back to the list, select the second product and do this for all products. const fs = require('fs'); const puppeteer = require('puppeteer'); function extractItems() { const extractedElements = document.querySelectorAll('

Headless Chrome to print pdf

阅读更多关于 Headless Chrome to print pdf

问题 I am trying to use Headless feature of the Chrome to convert a html to pdf. However, i am not getting output at all. Console doesn't show any error as well. I am running below commands in my windows m/c. chrome --headless --disable-gpu --print-to-pdf I tried all the various options. Nothing is being generated. I am having chrome version 60 回答1: This is working: chrome --headless --disable-gpu --print-to-pdf=file1.pdf https://www.google.co.in/ creates file in the folder: C:\Program Files (x86)

Headless chrome web driver too slow and unable to download file

阅读更多关于 Headless chrome web driver too slow and unable to download file

问题 I am trying to download a file using python headless chrome web driver. My code is running to slow.There is no output(downloaded) file. I am getting no error. Any help would be appreciated. here is my code: # Getting All User Credintials for x in range(2,st.max_row + 1): Users.append([st.cell(x, 1).value,st.cell(x, 2).value, st.cell(x, 3).value]) # Looping through Users for item in Users: try: chrome_options = Options() chrome_options.add_argument("--headless") prefs = {"download.default

“Uncaught [object Object]” when running karma tests on Angular

阅读更多关于 “Uncaught [object Object]” when running karma tests on Angular

问题 I am fighting with this strange error when running unit tests for my application. zone.js:260 Uncaught [object Object] thrown Zone.runTask @ zone.js:260 ZoneTask.invoke @ zone.js:423 I dont know which test is failing cause the console only drop that error. It is not hapenning in my local, where the tests run whithout any problem. Before that error i was suffering the "Script error" error but I solved it with the --disable-web-security flag for ChromeHeadless. I dont know if it has something

How can I disable webRTC local IP leak with puppeteer?

阅读更多关于 How can I disable webRTC local IP leak with puppeteer?

I tried: const browser = await puppeteer.launch({args: ['--enable-webrtc-stun-origin=false', '--enforce-webrtc-ip-permission-check=false']}); But this is not working. Next I tried: const targets = await browser.targets(); const backgroundPageTarget = targets.find(target => target.type() === 'background_page'); const backgroundPage = await backgroundPageTarget.page(); await backgroundPage.evaluateevaluateOnNewDocument(() => { chrome.privacy.network.webRTCIPHandlingPolicy.set({ value: "default_public_interface_only" }); }); But got: TypeError: Cannot read property 'page' of undefined EDIT: Need