cheerio

Select elements with an attribute with cheerio

你离开我真会死。 提交于 2019-12-05 07:28:02
What is the most efficient way to select all dom elements that have a certain attribute. <input name="mode"> With plain javascript I would use : document.querySelectorAll("[name='mode']") or document.querySelectorAll("[name]") if I don't care about the attribute value. Ok I found it in the cheerio documentation, here is how you do it: $('[name=mode]') cheerio docs: Selectors For some reason, the accepted answer didn't work for me (using cheerio ^1.0.0-rc.2 here). But for the following markup: <input value="123" name="data[text_amount]"> this did work: $('input[name="data[text_amount]"]')); The

Incremental and non-incremental urls in node js with cheerio and request

让人想犯罪 __ 提交于 2019-12-05 02:38:30
问题 I am trying to scrape data from a page using cheerio and request in the following way: 1) go to url 1a (http://example.com/0) 2) extract url 1b (http://example2.com/52) 3) go to url 1b 4) extract some data and save 5) go to url 1a+1 (http://example.com/1, let's call it 2a) 6) extract url 2b (http://example2.com/693) 7) go to url 2b 8) extract some data and save etc... I am struggling work out how to do this (note, I only am familiar with node js and cheerio/request for this task even though

Executing scraped JavaScript with cheerio

两盒软妹~` 提交于 2019-12-04 03:31:22
问题 I have a web page in which there are some JS APIs that don't alter the dom, but return some numbers. I'd like to write a NodeJS application that downloads such pages and executes those functions in the context of the downloaded page. I was looking at cheerio for page scraping.. but while I see how easy is it to navigate and manipulate the DOM with it, I don't see any access to running the page functions. Is it possible to do it? Should I look, instead, at jsdom? Thanks 回答1: Sounds like you

Async/Await with Request-Promise returns Undefined

你离开我真会死。 提交于 2019-12-04 02:35:21
I have two files; server.js and scrape.js, below are the code snippets as they currently stand. server.js: const scrape = require("./scrape"); async function start() { const response = await scrape.start(); console.log(response); } start(); and scrape.js: const cheerio = require("cheerio"); const request = require("request-promise"); go = async () => { const options = { uri: "http://www.somewebsite.com/something", transform: function(body) { return cheerio.load(body); } }; request(options) .then($ => { let scrapeTitleArray = []; $(".some-class-in-html").each(function(i, obj) { const data = $

Node.js + request + for loop : Runs twice

点点圈 提交于 2019-12-03 22:00:17
I created a simple scraper using cheerio and request client but it doesn't work the way I want. First I see all the "null returned, do nothing" messages on the terminal and then see the names, so I think it first checks all the urls that returns a null, then non-nulls. I want it to run in the right order, from 1 to 100. app.get('/back', function (req, res) { for (var y = 1; y < 100; y++) { (function () { var url = "example.com/person/" + y +; var options2 = { url: url, headers: { 'User-Agent': req.headers['user-agent'], 'Content-Type': 'application/json; charset=utf-8' } }; request(options2,

Incremental and non-incremental urls in node js with cheerio and request

大城市里の小女人 提交于 2019-12-03 17:09:29
I am trying to scrape data from a page using cheerio and request in the following way: 1) go to url 1a ( http://example.com/0 ) 2) extract url 1b ( http://example2.com/52 ) 3) go to url 1b 4) extract some data and save 5) go to url 1a+1 ( http://example.com/1 , let's call it 2a) 6) extract url 2b ( http://example2.com/693 ) 7) go to url 2b 8) extract some data and save etc... I am struggling work out how to do this (note, I only am familiar with node js and cheerio/request for this task even though it is likely not elegant, so am not looking for alternative libraries or languages to do this in

Cheerio: How to select element by text content?

故事扮演 提交于 2019-12-03 11:08:05
I have some HTML like this: <span id="cod">Code:</span> <span>12345</span> <span>Category:</span> <span>faucets</span> I want to fetch the category name ("faucets"). This is my trial: var $ = cheerio.load(html.contents); var category = $('span[innerHTML="Category:"]').next().text(); But this doesn't work (the innerHTML modifier does not select anything). Any clue? The reason your code isn't working is because [innerHTML] is an attribute selector, and innerHTML isn't an attribute on the element (which means that nothing is selected). You could filter the span elements based on their text. In

Scraping with Meteor.js

﹥>﹥吖頭↗ 提交于 2019-12-03 04:32:38
问题 Can I scrape with meteor.js? Just discovered cheerio which works excellent combined with request . Can I use these with meteor, or is there something similar? Do you have an working example? 回答1: Of course! Its hard to imagine what meteor can't do! First you need something to handle the remote http requests. In your meteor directory in the terminal run meteor add http to add the Meteor.http package, also npm install cheerio (have a look at another SO question on how to install npm modules to

Nodejs -- 使用koa2搭建数据爬虫

匿名 (未验证) 提交于 2019-12-02 23:26:52
cheerio : 则能够对请求结果进行解析,解析方式和jquery的解析方式几乎完全相同 cheerio中文文档 开发参考node - cheerio模块 superagent : 能够实现主动发起get/post/delete等请求 superagent-charset : 解决爬虫数据中文乱码问题,早期版本单独使用,现配合superagent使用 koa2 : 搭建服务器环境等等 koa-router: koa路由,用于根据路由访问对应代码块,逻辑编写等作用(把他理解为像日常API接口就好) knex : 操作数据库,支持多种数据库,这里使用mysql,需要mysql中间件 开发参考knex笔记 在项目根目录下 npm init 一路回车,初始化项目环境,出现package.json文件,然后执行以下命令安装项目依赖 npm i --save cheerio superagent superagent-charset koa-router koa knex mysql 在项目根目录下创建app.js文件,编写coding const Koa = require('koa'), Router = require('koa-router'), cheerio = require('cheerio'), charset = require('superagent-charset')

Can I load a local html file with the cheerio package in node.js?

痴心易碎 提交于 2019-12-02 21:24:05
I have a few html files on my harddrive that I'd like to use jquery on to extract data from. Is this possible to do using cheerio? I've tried giving cheerio the local path but it doesn't work. One idea I had would be to create a web server in node, read from the html file, and then pipe it to cheerio through the server - would this damphat The input is an html string, so you need to read the html content yourself: var fs = require('fs'); cheerio.load(fs.readFileSync('path/to/file.html')); A html file can be read asynchronously with the readFile function from the fs module. When the reading of