Node x-ray crawling data from collection of url

眉间皱痕 提交于 2019-12-11 06:26:16

问题


I'm trying to scrape a list in a site that leads to other pages that has the same formatting.

I was able to create a collection of all the a tags, but when I try to visit a collection of pages, the key I try to create with it doesn't get added in my returned object.

Here's an example of what I'm trying to do with stack overflow:

var Xray = require('x-ray');
var x = Xray();
x('http://stackoverflow.com/', {
    title: x(['a@href'], 'title'),
}) (function(err, obj) {
    console.log(obj);
});

I'm expecting my obj.title to be a list of titles of all the a href pages, instead I just get an empty object.

However if I were to try just using the first a href then I get the title no problem.

var Xray = require('x-ray');
var x = Xray();
x('http://stackoverflow.com/', {
    title: x('a@href', 'title'),
}) (function(err, obj) {
    console.log(obj);
});

Has anyone run into this problem before?


回答1:


I ran into that problem before and my solution goes like this:

var Xray = require('x-ray');
var x = Xray();
x('http://stackoverflow.com/', {
    title: x('a', [{links:'@href'}])
}) (function(err, obj) {
    obj.forEach(function(links.link) {
        x(links.link, "title")(function(err, data){
                console.log(data) // should print the title
        });
});

Let me know if you run into any problems.




回答2:


You could Use X-ray's Crawling to anoth site

var Xray = require('x-ray');
var x = Xray();

x("http://stackoverflow.com/", {
  main: 'title',
  image: x('#gbar a@href', 'title'), // follow link to google images 
})(function(err, obj) {
/*


来源:https://stackoverflow.com/questions/39609440/node-x-ray-crawling-data-from-collection-of-url

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!