headless-browser | 易学教程

How to scrape javascript injected image src and alt with phantom.js?

阅读更多关于 How to scrape javascript injected image src and alt with phantom.js?

问题 I'm using the following script to scrape images using phantom.js: var page = require('webpage').create(); url = 'https://www.everlane.com/collections/mens-luxury-tees/products/mens-crew-antique' page.open(url, function(status) { if (status !== 'success') { console.log('error'); phantom.exit(); return; } var a = page.evaluate(function() { return document.getElementsByTagName('img'); }); SrcAlt = []; for (var i=0; i<a.length; i++){ var src = a[i].getAttribute('src'); var alt = a[i].getAttribute

CasperJS cannot set window.navigator object

阅读更多关于 CasperJS cannot set window.navigator object

问题 Trying to scrape a web page with CasperJS. Webpage checks to see if the browser is an IE 6/7. Passing an userAgent with casperjs doesn't seem to satisfy its condition. UserAgent passed: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1) Following is the check being made by the page to determine the browser agt = navigator.userAgent.toLowerCase(); browserType = navigator.appName; if( ((browserType.indexOf("xplorer") != -1) && (agt.indexOf("msie 6.") != -1)) || ((browserType.indexOf("xplorer")

How to upload file in headless browser using robot class in selenium java

阅读更多关于 How to upload file in headless browser using robot class in selenium java

问题 How to upload file in headless browser using robot class in selenium java as sendkeys() method not working in my case. I am using firefox and selenium web driver java for my script 回答1: No need to use Robot class for uploading file using selenium java. Just at first, (1) Upload your files in /tmp folder in case of linux and temp folder in case of windows OS and then, use below code to upload files String path = FILE_UPLOAD_PATH; //(Full path with file name from /tmp folder) driver.findElement

Need headless browser for Armv7 linux processor

阅读更多关于 Need headless browser for Armv7 linux processor

问题 I need a headless browser for webscraping.Recently i tried 3 different headless browsers( PhantomJS,Firefox,Chrome ). When using phantomJS , it gives some error (i.e):Armv7 processor needs GUI . then,am using Firefox with geckodriver , it shows errors in the path and connection refused . so that i moved to chrome headless browser with chromedriver ,but it also shows same errors as Firefox . So,I need a correct headless browser for Armv7 processor. Can anyone suggest solution for that or any

How can I pause and wait for user input with Puppeteer?

阅读更多关于 How can I pause and wait for user input with Puppeteer?

问题 I need to make Puppeteer pause and wait for user input of username and password before continuing. It is a nodejs 8.12.0 app. (async () => { const browser = await puppeteer.launch({headless: false}); const page = await browser.newPage(); await page.goto('https://www.myweb.com/login/'); //code to wait for user to enter username and password, and click `login` const first_page = await page.content(); //do something await browser.close(); )}(); Basically the program is halted and waits until the

Running chrome headless on linux without xorg

阅读更多关于 Running chrome headless on linux without xorg

问题 Is it possible to install and run chrome headless on a headless Linux box without installing the audio and xorg dependencies? If not, then is there a special headless build of chrome/chromium which doesn't pull xorg and audio libs? 回答1: This troubleshooting doc on puppeteer should be of some help (https://github.com/GoogleChrome/puppeteer/blob/master/docs/troubleshooting.md), it oultlines all the package necessary for running Chrome on a linux machine (more specifically for web servers).

How can I get Firebug to match HtmlUnitDriver's pageSource report?

阅读更多关于 How can I get Firebug to match HtmlUnitDriver's pageSource report?

问题 I'm using Java with the Selenium Library to scrape a webpage. When I use Firebug on the page in Firefox, I can see that the page's source contains the following HTML structure: <div> <div> <table> <caption /> <thead /> <tbody /> </table> </div> </div> However, when I programatically download the page's source using HtmlUnitDriver, then use driver.getPageSource(), I see that the corresponding HTML structure has changed to: <div> <table> <caption /> <tbody /> </table> </div> Why does the

Limit chrome headless CPU and memory usage

阅读更多关于 Limit chrome headless CPU and memory usage

问题 I am using selenium to run chrome headless with the following command: system "LC_ALL=C google-chrome --headless --enable-logging --hide-scrollbars --remote-debugging-port=#{debug_port} --remote-debugging-address=0.0.0.0 --disable-gpu --no-sandbox --ignore-certificate-errors &" However it appears that chrome headless is consuming too much memory and cpu,anyone know how we can limit CPU/Memory usage of chrome headless? Or if there is some workaround. Thanks in advance. 回答1: There had been a

Developing scraping script on docker image - how to overcome lack of visual browser?

阅读更多关于 Developing scraping script on docker image - how to overcome lack of visual browser?

问题 I want to scrape info from the web and a previous attempt has taught me that docker would have been useful to run my script on since I develop the script on mac os x and then run it on a vm often ubuntu it often won't run since the dependencies don't exist on ubuntu and have proven difficult to build. Docker overcomes the dependency issue, but this now leads me to a different problem in that I need to develop the script in non-headless mode on the docker image to see what it's doing (or at

Watir-Webdriver Frame Attributes Not Congurent with Other Sources

阅读更多关于 Watir-Webdriver Frame Attributes Not Congurent with Other Sources

问题 I have an issue where if I return the some attributes of a frame they do not match those in Firebug for example. The reason is that I am looking for a way to identify the purpose of a frame. For example on www.cnet.com they load 19 frames in total and some of these are HTML with JavaScript. I want to inspect some of the frames but not all. Using Firebug I see some interesting attributes regarding the frame and I want filter the frame based on some of these attributes. I have the following