问题
I'm using casperjs to scrape text from a website and so far it works fine. However, this page that I'm scraping from has hundreds of products on it and some of these products have an orange button next to them.
The orange button has a class of button small orange
. If you click on this orange button it will bring up a light box with a description of the product.
How would I have casper click on the orange button if it's there then scrape the description, then exit the light box then keep on iterating through the 100s of products?
回答1:
You would need to determine the elements that are involved in each step. You can do that with the developer tools in Firefox or Chrome.
You can find the number of elements like this:
var buttonNumber = casper.getElementsInfo(".button.small.orange").length;
You then iterate over the buttons with the maximum in mind:
var x = require('casper').selectXPath
for(var i = 0; i < buttonNumber; i++) {
casper.thenClick(x("(//*[contains(@class,'button') and contains(@class,'small') and contains(@class,'orange')])["+(i+1)+"]"));
scheduleScrapeAndClose();
}
The //*[contains(@class,'button') and ...]
part of the XPath expression is basically the equivalent of the .button.small.orange
CSS selector. It returns a node list and the index after that is then the button that you iterate over. Like: (//*[...])[1]
The only thing that you have to do, is defining the scheduleScrapeAndClose
function. It will probably look something like this:
function scheduleScrapeAndClose(){
casper.waitUntilVisible("your light box selector");
casper.then(function(){
// scrape the description
var descr = this.fetchText("your description selector");
this.click("your light box close selector");
});
casper.waitWhileVisible("again, your light box selector");
}
I assume that there exists only one lightbox for every button click.
Putting it all together it would look like this:
var x = require('casper').selectXPath,
casper = require('casper').create();
function scheduleScrapeAndClose(){
// stuff from above
}
casper.start(url);
casper.then(function(){
var buttonNumber = casper.getElementsInfo(".button.small.orange").length;
for(var i = 0; i < buttonNumber; i++) {
casper.thenClick(x("(//*[contains(@class,'button') and contains(@class,'small') and contains(@class,'orange')])["+(i+1)+"]"));
scheduleScrapeAndClose();
}
});
casper.run(function(){this.exit();});
来源:https://stackoverflow.com/questions/24498550/scraping-text-from-lightbox-using-casperjs