cheerio | 易学教程

How to make cheerio's `$` accessible in helper functions in a clean way?

阅读更多关于 How to make cheerio's `$` accessible in helper functions in a clean way?

问题 I am fairly new to JavaScript and I am trying to refactor this const rp = require('request-promise'); const cheerio = require('cheerio'); // Basically jQuery for node.js // shared function function getPage(url) { const options = { uri: url, transform: function(body) { return cheerio.load(body); } }; return rp(options); } getPage('https://friendspage.org').then($ => { // Processing 1 const nxtPage = $("a[data-url$='nxtPageId']").attr('data'); return getPage(nxtPage).then($ => { // Processing 2

call back on cheerio node.js

阅读更多关于 call back on cheerio node.js

问题 I'm trying to write a scrapper using 'request' and 'cheerio'. I have an array of 100 urls. I'm looping over the array and using 'request' on each url and then doing cheerio.load(body). If I increase i above 3 (i.e. change it to i < 3 for testing) the scraper breaks because var productNumber is undefined and I can't call split on undefined variable. I think that the for loop is moving on before the webpage responds and has time to load the body with cheerio, and this question: nodeJS - Using a

Scraping links from website using Node.js, request, and cheerio?

阅读更多关于 Scraping links from website using Node.js, request, and cheerio?

问题 I'm trying to scrape links on my school's course schedule website using Node.js, request, and cheerio. However, my code is not reaching all subject links. Link to course schedule website here. Below is my code: var express = require('express'); var request = require('request'); var cheerio = require('cheerio'); var app = express(); app.get('/subjects', function(req, res) { var URL = 'http://courseschedules.njit.edu/index.aspx?semester=2016s'; request(URL, function(error, response, body) { if(

RangeError: Maximum call stack size exceeded caused by array.splice.apply?

阅读更多关于 RangeError: Maximum call stack size exceeded caused by array.splice.apply?

问题 I'm running a cheerio task and it throws an exception that prints this (Note that I added the log statements that print the size of spliceArgs and array : [14:17:08] Starting 'test:css'... SPLICE ARGS LENGTH: 4 ARRAY LENGTH: 5 SPLICE ARGS LENGTH: 132519 ARRAY LENGTH: 0 /home/ole/@superflycss/utilities-fonts/node_modules/cheerio/lib/api/manipulation.js:109 return array.splice.apply(array, spliceArgs); ^ RangeError: Maximum call stack size exceeded at uniqueSplice (/home/ole/@superflycss

Cannot seem to scrape a div class tag in Node.js

阅读更多关于 Cannot seem to scrape a div class tag in Node.js

问题 I'm new to node.js. My experience has been in Java and VBA. I'm trying to scrape a website for a friend and all is going well until I can't get what I’m after. <div class="gwt-Label ADC2X2-c-q ADC2X2-b-nb ADC2X2-b-Zb">Phone: +4576 102900</div> That tag just has a text. no attr or anything. Yet I cannot scrape it using cheerio. if(!err && resp.statusCode == 200){ var $ = cheerio.load(body); var number = $('//tried everything here!').text(); console.log(number); this function I also played

Scraping using google cloud function it finished with status code: 304

阅读更多关于 Scraping using google cloud function it finished with status code: 304

问题 I am trying out google cloud functions, it works but finishes with status code of 304 not sure what is the reason. Below is the code, //gcloud beta functions deploy scrapeGitCollection --trigger-http var cheerio = require('cheerio'); var request = require('request'); function getDateTime() { var date = new Date(); var hour = date.getHours(); hour = (hour < 10 ? "0" : "") + hour; var min = date.getMinutes(); min = (min < 10 ? "0" : "") + min; var sec = date.getSeconds(); sec = (sec < 10 ? "0"

Node.js - Using a callback function with Cheerio

阅读更多关于 Node.js - Using a callback function with Cheerio

问题 I'm building a scraper in Node, which uses request and cheerio to load in pages and parse them. It's important that I put a callback only AFTER Request and Cheerio has finished loading the page. I'm trying to use the async extension, but I'm not entirely sure where to put the callback. request(url, function (err, resp, body) { var $; if (err) { console.log("Error!: " + err + " using " + url); } else { async.series([ function (callback) { $ = cheerio.load(body); callback(); }, function

Cheerio doesn't wait for body to load

阅读更多关于 Cheerio doesn't wait for body to load

问题 I made a very simple script which scrape a recipes website to get the title, time of preparation and the ingredients. Everything works fine except that the script is not able to scrape each page of my arrays. Sometimes i get 4 of them, sometimes 2, sometimes even 0 ... It seems that the script doesn't wait the body to be fully loaded. I'm fully aware that cheerio doesn't understand javascript on website, but for all i know the information I scrape aren't generated from any script, it is pure

SVG Image turns black after updating his path via jQuery

阅读更多关于 SVG Image turns black after updating his path via jQuery

问题 I have the following html code <div> <a id="cover"></a> <svg version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="100%" height="100%" viewBox="0 0 510 680"> <rect x="0" y="0" fill="#000007" width="510" height="680"/> <image width="510" height="680" xlink:href="../images/MSRCover.png" transform="translate(0 0)" /> </svg> </div> I am trying to change the image's path with jQuery and the image turns black. $ = cheerio.load(data); $('image').each

Scrapping all elements with cheerio

阅读更多关于 Scrapping all elements with cheerio

问题 I am running the below code to scrap data. However, the code only scraps the first element. const cheerio = require('cheerio') const jsonframe = require('jsonframe-cheerio') const got = require('got'); async function scrapCoinmarketCap() { const url = 'https://coinmarketcap.com/all/views/all/' const html = await got(url) const $ = cheerio.load(html.body) jsonframe($) // initializing the plugin let frame = { "Coin": "td.no-wrap.currency-name > a", "url": "td.no-wrap.currency-name > a @ href",