Scraping text from lightbox using casperjs

折月煮酒 提交于 2019-12-23 05:26:17

问题


I'm using casperjs to scrape text from a website and so far it works fine. However, this page that I'm scraping from has hundreds of products on it and some of these products have an orange button next to them.

The orange button has a class of button small orange. If you click on this orange button it will bring up a light box with a description of the product.

How would I have casper click on the orange button if it's there then scrape the description, then exit the light box then keep on iterating through the 100s of products?


回答1:


You would need to determine the elements that are involved in each step. You can do that with the developer tools in Firefox or Chrome.

You can find the number of elements like this:

var buttonNumber = casper.getElementsInfo(".button.small.orange").length;

You then iterate over the buttons with the maximum in mind:

var x = require('casper').selectXPath
for(var i = 0; i < buttonNumber; i++) {
    casper.thenClick(x("(//*[contains(@class,'button') and contains(@class,'small') and contains(@class,'orange')])["+(i+1)+"]"));
    scheduleScrapeAndClose();
}

The //*[contains(@class,'button') and ...] part of the XPath expression is basically the equivalent of the .button.small.orange CSS selector. It returns a node list and the index after that is then the button that you iterate over. Like: (//*[...])[1]

The only thing that you have to do, is defining the scheduleScrapeAndClose function. It will probably look something like this:

function scheduleScrapeAndClose(){
    casper.waitUntilVisible("your light box selector");
    casper.then(function(){
        // scrape the description
        var descr = this.fetchText("your description selector");
        this.click("your light box close selector");
    });
    casper.waitWhileVisible("again, your light box selector");
}

I assume that there exists only one lightbox for every button click.

Putting it all together it would look like this:

var x = require('casper').selectXPath,
    casper = require('casper').create();

function scheduleScrapeAndClose(){
    // stuff from above
}
casper.start(url);
casper.then(function(){
    var buttonNumber = casper.getElementsInfo(".button.small.orange").length;
    for(var i = 0; i < buttonNumber; i++) {
        casper.thenClick(x("(//*[contains(@class,'button') and contains(@class,'small') and contains(@class,'orange')])["+(i+1)+"]"));
        scheduleScrapeAndClose();
    }
});
casper.run(function(){this.exit();});


来源:https://stackoverflow.com/questions/24498550/scraping-text-from-lightbox-using-casperjs

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!