cheerio | 易学教程

How to put scraping content to html (Node.js, cheerio)

阅读更多关于 How to put scraping content to html (Node.js, cheerio)

问题 i need to scrapping some content, and added it to my html file. var request = require('request'); var cheerio = require('cheerio'); setInterval(function () { request('https://2ch.hk/rf/res/1490589.html', function (error, response, html) { if (!error && response.statusCode == 200) { var $ = cheerio.load(html); $('.post-message').each(function (i, element) { var a = $(this).text(); console.log(a); }); } }); }, 5000); Now, i have a parsed page to my console. But i dont understand, how to put in

Node.js Cheerio parser breaks UTF-8 encoding

阅读更多关于 Node.js Cheerio parser breaks UTF-8 encoding

问题 I parse my request with Cheerio like this: var url = http://shop.nag.ru/catalog/16939.IP-videonablyudenie-OMNY/16944.IP-kamery-OMNY-c-vario-obektivom/16704.OMNY-1000-PRO; request.get(url, function (err, response, body) { console.log(body); $ = cheerio.load(body); console.log($(".description").html()); }); And as output I see content but in unreadable strange encoding: //Plain body console.log(body) (p.s. russian chars): <h1>Уличная 3Мп IP HD камера OMNY -

Cheerio: How to select element by text content?

阅读更多关于 Cheerio: How to select element by text content?

问题 I have some HTML like this: Code: 12345 Category: faucets I want to fetch the category name ("faucets"). This is my trial: var $ = cheerio.load(html.contents); var category = $('span[innerHTML="Category:"]').next().text(); But this doesn't work (the innerHTML modifier does not select anything). Any clue? 回答1: The reason your code isn't working is because [innerHTML] is an attribute selector, and innerHTML isn't an attribute on the

使用node写爬虫

阅读更多关于使用node写爬虫

node爬虫爬虫介绍爬取接口使用 axios 使用与接口类型的爬取爬取页面使用 request + cheerio 适用于后端渲染，直接返回 HTML 页面的情况 cheerio 使用方法类似于 jQuery 文档关于 request 转码问题 const cheerio = require ( 'cheerio' ) const requuest = require ( 'request-promise' ) // 转码问题 let iconv = require ( 'iconv-lite' ) let url = 'http://top.baidu.com/category?c=10&fr=topindex' let options = { url , encoding : null // 告诉 request 不要帮我把 buffer 转成字符串 } request ( options , async ( err , response , body ) => { // console.log(body.toString()) // 默认转 utf8 编码 // 获取返回的编码格式 let ContentType = response . headers [ 'content-type' ] let encoding ; if ( ContentType .

Node.js + request + for loop : Runs twice

阅读更多关于 Node.js + request + for loop : Runs twice

问题 I created a simple scraper using cheerio and request client but it doesn't work the way I want. First I see all the "null returned, do nothing" messages on the terminal and then see the names, so I think it first checks all the urls that returns a null, then non-nulls. I want it to run in the right order, from 1 to 100. app.get('/back', function (req, res) { for (var y = 1; y < 100; y++) { (function () { var url = "example.com/person/" + y +; var options2 = { url: url, headers: { 'User-Agent'

Uncaught Error: Cannot find module 'cheerio' Nodewebkit

阅读更多关于 Uncaught Error: Cannot find module 'cheerio' Nodewebkit

问题 I am trying to develop a node webkit application and trying to use the cheerio library. I have imported it using var cheerio = require("cheerio"); However when I run the program, I get the following error: Uncaught Error: Cannot find module 'cheerio' module.js:329 My node_modules folder contains the cheerio folder and I have included it in my package.json file as well. I even tried installing cheerio globally and I face this error. 来源： https://stackoverflow.com/questions/31489279/uncaught

Unable to perform requests in node js

阅读更多关于 Unable to perform requests in node js

问题 I have some code like this:- const request=require('request'); const url=require('url'); const cheerio=require('cheerio'); request({uri: 'https://www.facebook.com/'}, (err, resp, body)=>{ var $=cheerio.load(body); console.log($('title').text()); console.log($('body').text()) }); When I try to request to facebook or google the response returned is something like:- Facebook - ??? ??? ?????? | Facebook ??????? login or signup ??????? ?? or Google Signin??? ???.... How can I solve these question

Scraping JavaScript-generated website with Node.js

阅读更多关于 Scraping JavaScript-generated website with Node.js

问题 When I parse a static html page, my node.js app works well. However, when the url is a JavaScript-generated page, the app doesn't work. How can I scrape a JavaScript-generated web page? My app.js var express = require('express'), fs = require('fs'), request = require('request'), cheerio = require('cheerio'), app = express(); app.get('/scrape', function( req, res ) { url = 'http://www.apache.org/'; request( url, function( error, response, html ) { if( !error ) { var $ = cheerio.load(html); var

jquery selectors: an uncommon use case

阅读更多关于 jquery selectors: an uncommon use case

问题 I have to parse an html page organized this way: <li id="list"> <a id="cities">Cities</a> <ul> <li> <a class="region" title="liguria">Liguria</a> <ul> <li> <a class="genova">Genova</a> </li> <li> <a class="savona">Savona</a> </li> </ul> </li> <li> <a class="region" title="lazio">Lazio</a> <ul> <li> <a class="roma">Roma</a> </li> </ul> </li> </ul> </li> I need to extract a list of all the cities . I don't care about regions... I am using cheerio from node.js , but I added jquery to the tags

Node js console.log is not showing anything

阅读更多关于 Node js console.log is not showing anything

问题 I'm trying to scrap a webpage using node js.I think I've written the code and was able to run it without any errors but the problem is the console doesn't print anything no matter what I do.It is not showing any errors. What's the reason? Here is the content that I want to scrap: https://paste.ee/r/b3yrn var fs = require('fs'); var request = require('request'); var cheerio = require('cheerio'); var htmlString = fs.readFileSync('scrap.html').toString(); var $ = cheerio.load(htmlString); var