问题
I want to scrape some web data using CasperJS. The data is in a table, in each row there is a link leading to a page with more detail. In the script there is a loop iterating through all table rows. I want Casper to click the link, collect the data on a sub-page and come one history step back to process next table row. The problem is that the click() doesn't work and I don't know why. Is there any way to fix this ? (note: a javascript function viewContact is invoked by href)
Here is the code :
var employee = {
last_name: "",
first_name: "",
position: "",
department: "",
location: "",
email: "",
phone: "",
twitter: ""
};
var employees = [];
var result_number = 50;
var start_url = 'https://www.jigsaw.com/SearchContact.xhtml?companyId=489781&orderby=0&order=0&opCode=paging&mode=0&estimatedCount=126&dead=false&rpage=1&rowsPerPage=200';
var casper = require('casper').create({
javascriptEnabled: true
});
casper.start(start_url, function() {
var js = this.evaluate(function() {
return document;
});
for (var i = 1; i <= result_number; i++)
{
// j stands for three neighbour td columns containing:
// position, name+link, location
employee.position = this.getHTML('#sortableTable tr:nth-child(' + i + ') td:nth-child(3) span');
// click link and get other data
this.click('#sortableTable tr:nth-child(' + i + ') td:nth-child(4) span a');
employee.first_name = this.getHTML('#sortableTable tr:nth-child(' + i + ') td:nth-child(4) span a');
//collect data
this.waitForSelector('#firstname', function() {
employee.first_name = this.getHTML('#firstname');
});
this.waitForSelector('#lastname', function() {
employee.last_name = this.getHTML('#lastname');
});
this.waitForSelector('#state', function() {
employee.department = this.getHTML('#state');
});
this.waitForSelector('#email', function() {
employee.email = this.getHTML('#email');
});
this.waitForSelector('#phone', function() {
employee.phone = this.getHTML('#phone');
});
//get back to previous page
this.back();
employee.location = this.getHTML('#sortableTable tr:nth-child(' + i + ') td:nth-child(5) span');
this.echo('\n\n Employee number: ' + i + " :\n");
this.echo('first name : ' + employee.first_name);
this.echo('last name : ' + employee.last_name);
this.echo('position : ' + employee.position);
this.echo('department : ' + employee.department);
this.echo('location : ' + employee.location);
this.echo('email : ' + employee.email);
this.echo('phone : ' + employee.phone);
}
});
casper.run();
回答1:
I see two things here that need to be corrected. First, The for loop in your code doesn't appear to be in the scope of any casperjs methods.
This:
for (var i = 1; i <= result_number; i++)
It should be inside a casper.then
method. I notice you have the closing brackets so perhaps you've posted code by copy pasting it in a sloppy manner.
Secondly and most importantly, the tr:nth-child(' + i + ')
you'd like to interact with won't work in this way. I don't know why but it doesn't seem to work this straight forwardly. I've tried to do the same thing. My solution was to first of all convert the i
to a string instead of a number like so:
pageturn = pageturn + 1;
// Collect <td> contents on each page.
var pageturnString = pageturn.toString();
var linknum = 'a.SomeLinkClass:nth-child('+pageturnString+')';
in my case I'm using this to click to change the page, either way you must encapsulate your interaction with the said css selector inside a this.then()
method inside the first method, and then a second child method does the rest of the for loop.
Example:
casper.each(pagecount, function() {
this.then(function() {
pageturn = pageturn + 1;
// Collect <td> contents on each page.
var pageturnString = pageturn.toString();
var linknum = 'a.SomeLinkClass:nth-child('+pageturnString+')';
});
this.then(function() {
//Now run for loop here.
});
});
If you don't encapsulate the css selector construction within the this.then()
method before it's used in the next method, it won't work. I don't know why but that's the deal. In my code, pagecount
could possibly be used instead of your for loop but I'll leave that up to you.
回答2:
I've got a page where I'm seeing this in Casper:
[debug] [phantom] Mouse event 'mousedown' on selector: tr:nth-child(2) a
CasperError: Cannot dispatch mousedown event on nonexistent selector: tr:nth-child(2) a
As this error is caused by a failed exists, which relies on querySelectorAll, I've played around with that and found that the following sets x2 to null (although x1 isn't null):
this.evaluate(function() {
var x1 = document.querySelector('tr:nth-child(2) a');
var x2 = document.querySelector('tr:nth-child(2) a');
alert(x1 + ', ' + x2);
});
It seems to depend on there being a row that doesn't contain an <a>
, as you'd find in a header row. Here's a test page:
http://jsfiddle.net/GKb2g/4/
I'll hopefully post back here once I've found the cause, but in the meantime, you're best off using a selectXPath selector.
来源:https://stackoverflow.com/questions/16160707/casperjs-how-to-click-multiple-links-in-a-table-while-collecting-data-from-the-w