问题
I am working on web crawler which fetch data form website using crawler4j and everything goes well but the main problem is with ajax-based events . So, I found crawljax library does this matter but I couldn't where and when to use it .
When have I use it ( I mean work sequences )?
- before fetching page using crawler4j.
Or
- after fetching page using crawler4j.
Or
- have I use url coming using crawler4j and use it to fetch Ajax data (page) using crawljax.
回答1:
The library crawljax is basically a crawler for its own purpose. Integration into crawler4j
requires a lot of manual effort on your side.
I recommend, that you use a combination of Selenium and/or CasperJS and/or PhantomJS in front of crawler4j
, i.e. you could run the JavaScript engine as a Proxy in front of crawler4j
. However, this will slow down the performance of your web-crawleer
来源:https://stackoverflow.com/questions/55010898/how-to-add-integrate-crawljax-with-crawler4j