Puppeteer: How to get the contents of each element of a nodelist?

前端 未结 2 1715
Happy的楠姐
Happy的楠姐 2021-02-08 21:03

I\'m trying to achieve something very trivial: Get a list of elements, and then do something with the innerText of each element.

const tweets = awai         


        
相关标签:
2条回答
  • 2021-02-08 21:51

    page.$$():

    You can use a combination of elementHandle.getProperty() and jsHandle.jsonValue() to obtain the innerText from an ElementHandle obtained with page.$$():

    const tweets = await page.$$('.tweet');
    
    for (let i = 0; i < tweets.length; i++) {
      const tweet = await (await tweets[i].getProperty('innerText')).jsonValue();
      console.log(tweet);
    }
    

    If you are set on using the forEach() method, you can wrap the loop in a promise:

    const tweets = await page.$$('.tweet');
    
    await new Promise((resolve, reject) => {
      tweets.forEach(async (tweet, i) => {
        tweet = await (await tweet.getProperty('innerText')).jsonValue();
        console.log(tweet);
        if (i === tweets.length - 1) {
          resolve();
        }
      });
    });
    

    page.evaluate():

    Alternatively, you can skip using page.$$() entirely, and use page.evaluate():

    const tweets = await page.evaluate(() => Array.from(document.getElementsByClassName('tweet'), e => e.innerText));
    
    tweets.forEach(tweet => {
      console.log(tweet);
    });
    
    0 讨论(0)
  • 2021-02-08 21:57

    According to puppeteer docs here, $$ Does not return a nodelist, instead it returns a Promise of Array of ElementHandle. It's way different then a NodeList.

    There are several ways to solve the problem.

    1. Using built-in function for loops called page.$$eval

    This method runs Array.from(document.querySelectorAll(selector)) within the page and passes it as the first argument to pageFunction.

    So to get innerText is like following,

    // Find all .tweet, and return innerText for each element, in a array.
    const tweets = await page.$$eval('.tweet', element => element.innerText);
    

    2. Pass the elementHandle to the page.evaluate

    Whatever you get from await page.$$('.tweet') is an array of elementHandle. If you console, it will say JShandle or ElementHandle depending on the type.

    Forget the hard explanation, it's easier to demonstrate.

    // let's just call them tweetHandle 
    const tweetHandles = await page.$$('.tweet');
    
    // loop thru all handles
    for(const tweethandle of tweetHandles){
    
       // pass the single handle below
       const singleTweet = await page.evaluate(el => el.innerText, tweethandle)
    
       // do whatever you want with the data
       console.log(singleTweet) 
    }
    

    Of course there are multiple ways to solve this problem, Grant Miller also answered few of them in the other answer.

    0 讨论(0)
提交回复
热议问题