google-chrome-headless

Selenium Chrome Driver Limitations Web Scraping at Scale

谁说胖子不能爱 提交于 2019-12-10 16:16:15
问题 I'm planning to use Selenium Chrome Driver for my project which will be used to do web scraping to multiple public websites (something like kayak or skyscanner). So there will be a REST GET endpoint where my backend would launch headless Chrome to scrape multiple websites, and eventually return a manipulated JSON. I want to know how scalable is Chrome Driver as it sounds like a headless Chrome instance needs to be launched whenever a request comes in. Updated: Question using Google Chrome

Capybara Selenium Chrome opens About Google Chrome

半世苍凉 提交于 2019-12-10 13:14:11
问题 I have an issue with testing in Chrome. When I run the test it opens the chrome://settings/help page in a new tab. This causes my tests to fail as it can't find the buttons it should click on. I was debugging in Chrome in normal mode when I saw what happend. Can I prevent this from happening? Or can I keep the tab where I'm testing in focussed in some way? 回答1: I ran into the same issue this morning. All our tests were failing because chrome://settings/help was automatically opened when we

how to hide margins in headless chrome generated pdf?

被刻印的时光 ゝ 提交于 2019-12-10 10:04:49
问题 I'm using headless chrome to generate a long pdf document with Python/Django. Is there a way to remove header with date and footer with url and pages count from pages? Tried to use @page{ margin: 0; size: auto; } but with this css there are no margins, which i need. tried to wrap page content with div.wrapper and style .wrapper{ margin: 15mm 10mm 15mm 15mm; } but with this solution there are top and bottom margins only on first and last pages. Pages between are without vertical margins and

How to dump more than <body> on chrome / chromium headless?

放肆的年华 提交于 2019-12-10 04:17:00
问题 Chrome's documentation states: The --dump-dom flag prints document.body.innerHTML to stdout: As per the title, how can more of the DOM object (ideally all) be dumped with Chromium headless? I can manually save the entire DOM via the developer tools, but I want a programmatic solution. 回答1: Update 2019-04-23 Google was very active on headless front and many updates happened The answer below is valid for the v62 current version is v73 and it's updating all the time. https://www.chromestatus.com

how do POST request in puppeteer?

白昼怎懂夜的黑 提交于 2019-12-09 09:51:37
问题 (async() => { const browser = await puppeteer.launch(); const page = await browser.newPage(); await page.goto('https://www.example.com/search'); const data = await page.content(); browser.close(); res.send(data); })(); I do this code for send get request. I don't understand how I should send post request? 回答1: Getting the "order" right can be a bit of a challenge. Documentation doesn't have that many examples... there are some juicy items in the repository in the example folder that you

Node JS Puppteer Infinite scroll loop

旧城冷巷雨未停 提交于 2019-12-08 12:00:20
问题 I am learning Puppeteer & trying to scrape a website that has infinite scroll implemented. I am able to get all the Prices from the list, by scrolling down after a delay of 1 second. Here is the URL What I want to do is, open a item from the list, get the product name, go back to the list, select the second product and do this for all products. const fs = require('fs'); const puppeteer = require('puppeteer'); function extractItems() { const extractedElements = document.querySelectorAll('

Headless Chrome to print pdf

谁说胖子不能爱 提交于 2019-12-08 02:53:42
问题 I am trying to use Headless feature of the Chrome to convert a html to pdf. However, i am not getting output at all. Console doesn't show any error as well. I am running below commands in my windows m/c. chrome --headless --disable-gpu --print-to-pdf I tried all the various options. Nothing is being generated. I am having chrome version 60 回答1: This is working: chrome --headless --disable-gpu --print-to-pdf=file1.pdf https://www.google.co.in/ creates file in the folder: C:\Program Files (x86)

Headless chrome web driver too slow and unable to download file

浪子不回头ぞ 提交于 2019-12-07 16:46:03
问题 I am trying to download a file using python headless chrome web driver. My code is running to slow.There is no output(downloaded) file. I am getting no error. Any help would be appreciated. here is my code: # Getting All User Credintials for x in range(2,st.max_row + 1): Users.append([st.cell(x, 1).value,st.cell(x, 2).value, st.cell(x, 3).value]) # Looping through Users for item in Users: try: chrome_options = Options() chrome_options.add_argument("--headless") prefs = {"download.default

“Uncaught [object Object]” when running karma tests on Angular

社会主义新天地 提交于 2019-12-07 15:39:23
问题 I am fighting with this strange error when running unit tests for my application. zone.js:260 Uncaught [object Object] thrown Zone.runTask @ zone.js:260 ZoneTask.invoke @ zone.js:423 I dont know which test is failing cause the console only drop that error. It is not hapenning in my local, where the tests run whithout any problem. Before that error i was suffering the "Script error" error but I solved it with the --disable-web-security flag for ChromeHeadless. I dont know if it has something

How can I disable webRTC local IP leak with puppeteer?

流过昼夜 提交于 2019-12-07 08:20:30
I tried: const browser = await puppeteer.launch({args: ['--enable-webrtc-stun-origin=false', '--enforce-webrtc-ip-permission-check=false']}); But this is not working. Next I tried: const targets = await browser.targets(); const backgroundPageTarget = targets.find(target => target.type() === 'background_page'); const backgroundPage = await backgroundPageTarget.page(); await backgroundPage.evaluateevaluateOnNewDocument(() => { chrome.privacy.network.webRTCIPHandlingPolicy.set({ value: "default_public_interface_only" }); }); But got: TypeError: Cannot read property 'page' of undefined EDIT: Need