How do I get the absolute path for '<img src=''>' in node from the a response.body

别说谁变了你拦得住时间么 提交于 2019-12-05 12:05:16

To get an array of image links in your scenario, you can use url.resolve to resolve relative src attributes of img tags with the request URL, resulting in an absolute URL. The array is passed to the final then; you can do other things with the array other than console.log if so desired.

var rp = require('request-promise'),
    cheerio = require('cheerio'),
    url = require('url'),
    base = 'http://www.google.com';

var options = {
    uri: base,
    method: 'GET',
    resolveWithFullResponse: true
};

rp(options)
    .then (function (response) {
        var $ = cheerio.load(response.body);

        return $('img').map(function () {
            return url.resolve(base, $(this).attr('src'));
        }).toArray();
    })
    .then(console.log);

This url.resolve will work for absolute or relative URLs (it resolves and returns the combined absolute URL when resolving from your request URL to a relative path, but when resolving from your request URL to an absolute URL it just returns the absolute URL). For example, with img tags on google with /logos/cat.gif and https://test.com/dog.gif as the src attributes, this would output:

[ 
    'http://www.google.com/logos/cat.gif',
    'https://test.com/dog.gif'
]

Store your page URL as a variable use url.resolve to join the pieces together. In the Node REPL this works for both relative and absolute paths (hence the "resolving"):

$:~/Projects/test$ node
> var base = "https://www.google.com";
undefined
> var imageSrc = "/logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif";
undefined
> var url = require('url');
undefined
> url.resolve(base, imageSrc);
'https://www.google.com/logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif'
> imageSrc = base + imageSrc;
'https://www.google.com/logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif'
> url.resolve(base, imageSrc);
'https://www.google.com/logos/doodles/2016/phoebe-snetsingers-85th-birthday-5179281716019200-hp.gif'

Your code would change to something like:

var rp = require('request-promise'),
    cheerio = require('cheerio'),
    url = require('url'),
    base = 'http://www.google.com';

var options = {
    uri: base,
    method: 'GET',
    resolveWithFullResponse: true
};

rp(options)
  .then (function (response) {
    $ = cheerio.load(response.body);
    var relativeLinks = $("img");
    relativeLinks.each( function() {
        var link = $(this).attr('src');
        var fullImagePath = url.resolve(base, link); // should be absolute 
        console.log(link);
        if (link.startsWith('http')){
            console.log('abs');
        }
        else {
            console.log('rel');
        }
   });
});
skalpin

It looks like you're using jQuery, so you could

$('img').each(function(i, e) {
    console.log(e.src)
});

If you use src it will expand relative paths to absolute ones.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!