问题
I am crawling one website's data. I am able to whole content on a page. But some data on page comes after hover on some icons and shown as tooltips. So I require that data also. Is it possible with any crawler.
I am using PHP and simplehtmldom for parsing/ crawling page.
回答1:
Hover data can't be obtained by any crawlers.
Crawlers crawl the web page and gets whole data ( HTML page source ). It's view which we can view as soon as we hit URL. Hover need mouse moving action over HTML attribute on page i.e manual action. And currently no crawlers do actions for hovering and getting that data as per my knowledge. So this is not possible to get hover data by crawlers.
回答2:
One possibility is to execute the javascript using a javascript interpeter (took a quick look at this http://php.net/manual/en/book.v8js.php and it may be what you need) and then write some additional javascript code to fire the hover events on the necessary elements.
If the page is using AJAX to fill in the necessary fields, it may be easier to use a tool like Firebug to view the AJAX calls and recreate these in your code to fill in the missing DOM elements.
Another alternative is to rethink the crawler and perhaps use a browser-based crawler. This is something I don't have much experience with but I'm sure others have done this.
回答3:
I suggest looking into Selenium. I've used it many times and it can definitely do onmouseover
回答4:
You cannot obtain dynamic source code that requires user interaction with simple php curl. However, with PhantomJS you can achiever hover state and also grab future ajax loading of a page. It has a learning curve and you need to install it with node.js in your server see if you have rights.
With PhantomJS you will be able to get onmouseover or dynamic ajax content since it's a headless webkit browser that visits pages with your commands.
来源:https://stackoverflow.com/questions/9942376/how-to-get-hover-dataajax-by-any-crawler-php