cheerio

jQuery to access DOM in a site

删除回忆录丶 提交于 2019-12-02 14:58:29
问题 I am trying to scrape various elements in a table from this site to teach myself scraping using node.js, cheerio and request I have trouble getting the items in the table, essentially I want to get 'rank','company' and '3-year growth' from the table. How do I do this? Based on an online tutorial, I have developed my scraping.js script to look like this: var request = require ('request'), cheerio = require ('cheerio'); request('http://www.inc.com/inc5000/index.html', function (error, response,

How do I get an element name in cheerio with node.js

こ雲淡風輕ζ 提交于 2019-12-02 08:09:25
问题 How do I get an element's name in cheerio? The jQuery equivalent would be .attr('name') but that returns undefined in cheerio. 回答1: There's only one case, I suppose, when $someElement.attr('name') returns undefined - if there's NO attribute name on that element. For example... var cheerio = require('cheerio'), $ = cheerio.load( '<input id="one" type="input" /><input id="two" name="some_name" />'); console.log( $('#one').attr('name') ); // undefined console.log( $('#two').attr('name') ); //

How do I get an element name in cheerio with node.js

有些话、适合烂在心里 提交于 2019-12-02 07:23:17
How do I get an element's name in cheerio? The jQuery equivalent would be .attr('name') but that returns undefined in cheerio. There's only one case, I suppose, when $someElement.attr('name') returns undefined - if there's NO attribute name on that element. For example... var cheerio = require('cheerio'), $ = cheerio.load( '<input id="one" type="input" /><input id="two" name="some_name" />'); console.log( $('#one').attr('name') ); // undefined console.log( $('#two').attr('name') ); // some_name Note that <name> attribute is only applicable to the following set of elements (MDN): <a>, <applet>,

Weird characters when using console.print cheerio + nodejs

做~自己de王妃 提交于 2019-12-02 03:07:01
问题 I'm new to node.js and writing my first script to scrape some data. Does anyone know why I'm seeing weird characters with question marks inside them when using this code? var express = require('express'); var fs = require('fs'); var request = require('request'); var cheerio = require('cheerio'); var app = express(); var url = 'http://www.ebay.co.uk/csc/all-you-ever-want/m.html?LH_Complete=1&_ipg=50&_since=15&_sop=13&LH_FS=1&=&rt=nc&LH_ItemCondition=3'; request(url, function (error, response,

Return results from Request.js request method?

安稳与你 提交于 2019-12-01 20:39:57
问题 var request = require('request'); var cheerio = require('cheerio'); request(url, function (error, response, html) { if (!error && response.statusCode == 200) { var $ = cheerio.load(html); var link = $('.barbar li a'); var Url = link.attr('href'); var Title = link.find('span').first().text(); var results = [Url, Title]; return results; } }); console.log(results); results is undefined... I want to use the results to add a hyperlink to an HTML page, but I don't know how to access the results/

Return results from Request.js request method?

こ雲淡風輕ζ 提交于 2019-12-01 19:50:20
var request = require('request'); var cheerio = require('cheerio'); request(url, function (error, response, html) { if (!error && response.statusCode == 200) { var $ = cheerio.load(html); var link = $('.barbar li a'); var Url = link.attr('href'); var Title = link.find('span').first().text(); var results = [Url, Title]; return results; } }); console.log(results); results is undefined... I want to use the results to add a hyperlink to an HTML page, but I don't know how to access the results/ return them outside of the callback. I've seen other posts but they all use other libraries and usually

Get text in parent without children using cheerio

徘徊边缘 提交于 2019-12-01 15:00:52
问题 I am trying to extract just the content of a div - without any of the children of that div - using cheerio. If I just use div.text() - I get all the text - parent and children. Here's the HTML - I just want the value "5.25" The code below currently returns "Purchase price $5.25" The HTML below: <div class="outer tile"> < ... various other html here > <div class="cost"> <span class="text">Purchase price </span> <small>$</small>5.25 </div> </div> with the extract of the relevant node.js CHEERIO

Async/Await with Request-Promise returns Undefined

和自甴很熟 提交于 2019-11-30 02:39:10
问题 I have two files; server.js and scrape.js, below are the code snippets as they currently stand. server.js: const scrape = require("./scrape"); async function start() { const response = await scrape.start(); console.log(response); } start(); and scrape.js: const cheerio = require("cheerio"); const request = require("request-promise"); go = async () => { const options = { uri: "http://www.somewebsite.com/something", transform: function(body) { return cheerio.load(body); } }; request(options)

How can I use Node / Cheerio (or something else) to scrape a global variable from a site?

随声附和 提交于 2019-11-29 17:42:34
There is a global variable on a page that contains an object that I'd like to set up a scraper for. What's the best way to do this with Node / Express / potentially Cheerio? I understand Cheerio's benefit in traversing a DOM, but I know the name of the global variable I want to scrape and just need to extract its information on a set schedule charly rl Cheerio is just a dom parser, so you wont have acces to any javascriot or any javascript generated content. What you need is something like PhantomJS that simulates a browser. Have a look at this Stackoverflow answer 来源: https://stackoverflow

How can I use Node / Cheerio (or something else) to scrape a global variable from a site?

心已入冬 提交于 2019-11-28 12:24:14
问题 There is a global variable on a page that contains an object that I'd like to set up a scraper for. What's the best way to do this with Node / Express / potentially Cheerio? I understand Cheerio's benefit in traversing a DOM, but I know the name of the global variable I want to scrape and just need to extract its information on a set schedule 回答1: Cheerio is just a dom parser, so you wont have acces to any javascriot or any javascript generated content. What you need is something like