cheerio

How to make cheerio's `$` accessible in helper functions in a clean way?

烂漫一生 提交于 2019-12-11 19:51:01
问题 I am fairly new to JavaScript and I am trying to refactor this const rp = require('request-promise'); const cheerio = require('cheerio'); // Basically jQuery for node.js // shared function function getPage(url) { const options = { uri: url, transform: function(body) { return cheerio.load(body); } }; return rp(options); } getPage('https://friendspage.org').then($ => { // Processing 1 const nxtPage = $("a[data-url$='nxtPageId']").attr('data'); return getPage(nxtPage).then($ => { // Processing 2

call back on cheerio node.js

无人久伴 提交于 2019-12-11 18:36:36
问题 I'm trying to write a scrapper using 'request' and 'cheerio'. I have an array of 100 urls. I'm looping over the array and using 'request' on each url and then doing cheerio.load(body). If I increase i above 3 (i.e. change it to i < 3 for testing) the scraper breaks because var productNumber is undefined and I can't call split on undefined variable. I think that the for loop is moving on before the webpage responds and has time to load the body with cheerio, and this question: nodeJS - Using a

Scraping links from website using Node.js, request, and cheerio?

匆匆过客 提交于 2019-12-11 10:43:09
问题 I'm trying to scrape links on my school's course schedule website using Node.js, request, and cheerio. However, my code is not reaching all subject links. Link to course schedule website here. Below is my code: var express = require('express'); var request = require('request'); var cheerio = require('cheerio'); var app = express(); app.get('/subjects', function(req, res) { var URL = 'http://courseschedules.njit.edu/index.aspx?semester=2016s'; request(URL, function(error, response, body) { if(

RangeError: Maximum call stack size exceeded caused by array.splice.apply?

放肆的年华 提交于 2019-12-11 06:18:27
问题 I'm running a cheerio task and it throws an exception that prints this (Note that I added the log statements that print the size of spliceArgs and array : [14:17:08] Starting 'test:css'... SPLICE ARGS LENGTH: 4 ARRAY LENGTH: 5 SPLICE ARGS LENGTH: 132519 ARRAY LENGTH: 0 /home/ole/@superflycss/utilities-fonts/node_modules/cheerio/lib/api/manipulation.js:109 return array.splice.apply(array, spliceArgs); ^ RangeError: Maximum call stack size exceeded at uniqueSplice (/home/ole/@superflycss

Cannot seem to scrape a div class tag in Node.js

我的未来我决定 提交于 2019-12-11 03:34:30
问题 I'm new to node.js. My experience has been in Java and VBA. I'm trying to scrape a website for a friend and all is going well until I can't get what I’m after. <div class="gwt-Label ADC2X2-c-q ADC2X2-b-nb ADC2X2-b-Zb">Phone: +4576 102900</div> That tag just has a text. no attr or anything. Yet I cannot scrape it using cheerio. if(!err && resp.statusCode == 200){ var $ = cheerio.load(body); var number = $('//tried everything here!').text(); console.log(number); this function I also played

Scraping using google cloud function it finished with status code: 304

爱⌒轻易说出口 提交于 2019-12-11 01:59:36
问题 I am trying out google cloud functions, it works but finishes with status code of 304 not sure what is the reason. Below is the code, //gcloud beta functions deploy scrapeGitCollection --trigger-http var cheerio = require('cheerio'); var request = require('request'); function getDateTime() { var date = new Date(); var hour = date.getHours(); hour = (hour < 10 ? "0" : "") + hour; var min = date.getMinutes(); min = (min < 10 ? "0" : "") + min; var sec = date.getSeconds(); sec = (sec < 10 ? "0"

Node.js - Using a callback function with Cheerio

走远了吗. 提交于 2019-12-10 22:43:44
问题 I'm building a scraper in Node, which uses request and cheerio to load in pages and parse them. It's important that I put a callback only AFTER Request and Cheerio has finished loading the page. I'm trying to use the async extension, but I'm not entirely sure where to put the callback. request(url, function (err, resp, body) { var $; if (err) { console.log("Error!: " + err + " using " + url); } else { async.series([ function (callback) { $ = cheerio.load(body); callback(); }, function

Cheerio doesn't wait for body to load

只谈情不闲聊 提交于 2019-12-10 12:23:21
问题 I made a very simple script which scrape a recipes website to get the title, time of preparation and the ingredients. Everything works fine except that the script is not able to scrape each page of my arrays. Sometimes i get 4 of them, sometimes 2, sometimes even 0 ... It seems that the script doesn't wait the body to be fully loaded. I'm fully aware that cheerio doesn't understand javascript on website, but for all i know the information I scrape aren't generated from any script, it is pure

SVG Image turns black after updating his path via jQuery

北城以北 提交于 2019-12-10 12:17:32
问题 I have the following html code <div> <a id="cover"></a> <svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100%" height="100%" viewBox="0 0 510 680"> <rect x="0" y="0" fill="#000007" width="510" height="680"/> <image width="510" height="680" xlink:href="../images/MSRCover.png" transform="translate(0 0)" /> </svg> </div> I am trying to change the image's path with jQuery and the image turns black. $ = cheerio.load(data); $('image').each

Scrapping all elements with cheerio

浪子不回头ぞ 提交于 2019-12-10 11:48:18
问题 I am running the below code to scrap data. However, the code only scraps the first element. const cheerio = require('cheerio') const jsonframe = require('jsonframe-cheerio') const got = require('got'); async function scrapCoinmarketCap() { const url = 'https://coinmarketcap.com/all/views/all/' const html = await got(url) const $ = cheerio.load(html.body) jsonframe($) // initializing the plugin let frame = { "Coin": "td.no-wrap.currency-name > a", "url": "td.no-wrap.currency-name > a @ href",