问题
I'm facing a problem that I unable to get all the product data as the website using a lazy load on product catalog page. meaning it needs to scroll until the whole page loaded.
I getting only first-page products data.
回答1:
First, you should keep in mind that there are infinite ways that infinite scroll can be implemented. Sometimes you have to click buttons on the way or do any sort of transitions. I will cover only the most simple use-case here which is scrolling down with some interval and finishing when no new products are loaded.
If you build your own actor using Apify SDK, you can use infiniteScroll helper utility function. If it doesn't cover your use-case, ideally please give us feedback on Github.
If you are using generic Scrapers (Web Scraper or Puppeteer Scraper), the infinite scroll functionality is not currently built-in (but maybe if you read this in the future). On the other hand, it is not that complicated to implement it yourself, let me show you a simple solution for Web Scraper's
pageFunction
.
async function pageFunction(context) {
// few utilities
const { request, log, jQuery } = context;
const $ = jQuery;
// Here we define the infinite scroll function, it has to be defined inside pageFunction
const infiniteScroll = async (maxTime) => {
const startedAt = Date.now();
let itemCount = $('.my-class').length; // Update the selector
while (true) {
log.info(`INFINITE SCROLL --- ${itemCount} items loaded --- ${request.url}`)
// timeout to prevent infinite loop
if (Date.now() - startedAt > maxTime) {
return;
}
scrollBy(0, 9999);
await context.waitFor(5000); // This can be any number that works for your website
const currentItemCount = $('.my-class').length; // Update the selector
// We check if the number of items changed after the scroll, if not we finish
if (itemCount === currentItemCount) {
return;
}
itemCount = currentItemCount;
}
}
// Generally, you want to do the scrolling only on the category type page
if (request.userData.label === 'CATEGORY') {
await infiniteScroll(60000); // Let's try 60 seconds max
// ... Add your logic for categories
} else {
// Any logic for other types of pages
}
}
Of course, this is a really trivial example. Sometimes it can get much more complicated. I even once used Puppeteer to navigate my mouse directly and drag some scroll bar that was accessible programmatically.
来源:https://stackoverflow.com/questions/57291169/how-to-make-the-apify-crawler-to-scroll-full-page-when-web-page-have-infinite-sc