Command line URL fetch with JavaScript capabliity

邮差的信 提交于 2019-12-05 23:54:47

Get FireBug and see the URL for that Ajax request. You may then use curl with that URL.

Byron Whitlock

There are 2 ways to handle this. Write your screen scraper using a full browser based client like Webkit, or go to the actual page and find out what the AJAX requesting is doing and do request that directly. You then need to parse the results of course. Use firebug to help you out.

Check out this post for more info on the subject. The upvoted answer suggests using a test tool to drive a real browser. What's a good tool to screen-scrape with Javascript support?

I think env.js can handle <script> elements. It runs in the Rhino JavaScript interpreter and has it's own XMLHttpRequest object, so you should be able to at least run the scripts manually (select all the <script> tags, get the .js file, and call eval) if it doesn't automatically run them. Be careful about running scripts you don't trust though, since they can use any Java classes.

I haven't played with it since John Resig's first version, so I don't know much about how to use it, but there's a discussion group on Google Groups.

Maybe you could try and use features of HtmlUnit in your own utility?

HtmlUnit is a "GUI-Less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating either Firefox or Internet Explorer depending on the configuration you want to use.

It is typically used for testing purposes or to retrieve information from web sites.

Use LiveHttpHeaders a plug in for Firefox to see all URL details and then use the cURL with that url. LiveHttpHeaders shows all information like type of method(post or get) and headers body etc. it also show post or get parameters in headers i think this may help you.

you can use PhantomJS http://phantomjs.org

You can use it as below :

var page=require("webpage");
page.open("http://monster.com",function(status){
  page.evaluate(function(){
    /* your javascript code here 
        $.ajax("....",function(result){


            phantom.exit(0);
           }); */
  });
});
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!