scrape website using nodejs cheerio deep nested element tags

社会主义新天地 提交于 2021-01-28 06:50:12

问题


I'm trying to scrape text from a website but can't seem to extract anything.

below is the structure and code.

My code:

const rp = require("request-promise");
const $ = require("cheerio");
const url = "xx";

rp(url)
  .then(function(html) {
    //success!
    let token = "ce-bodytext";
    console.log($(token, response).length);
    console.log($(token, html)).text;
  })
  .catch(function(err) {
    console.log(JSON.stringify(err));
  });

While I just need the text, there was no id to the tag. Also, I was hoping ce-bodytext would extract all values in order

but all I get is empty output.

{}

How do I just extract the text as shown in the image?


回答1:


Try this:

let token = ".ce-bodytext>p>strong>font>font";
console.log($(token, html).text());



回答2:


ce-bodytext is a class, you forgot to add . before it :

const token = '.ce-bodytext';

It will at least fix the empty output.



来源:https://stackoverflow.com/questions/57053478/scrape-website-using-nodejs-cheerio-deep-nested-element-tags

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!