Mechanize and Javascript

后端 未结 5 526
耶瑟儿~
耶瑟儿~ 2020-11-28 08:53

I want to use Mechanize to simulate browsing to a web page with active JavaScript, including DOM Events and AJAX, and so far I\'ve found no way to do that.

I looked

相关标签:
5条回答
  • 2020-11-28 09:09

    From http://wwwsearch.sourceforge.net/mechanize/faq.html#general

    If you come across this in a page you want to automate, you have four options. Here they are, roughly in order of simplicity.

    Figure out what the JavaScript is doing and emulate it in your Python code: for example, by manually adding cookies to your CookieJar instance, calling methods on HTMLForms, calling urlopen, etc. See above re forms.

    Use Java’s HtmlUnit or HttpUnit from Jython, since they know some JavaScript.

    Instead of using mechanize, automate a browser instead. For example use MS Internet Explorer via its COM automation interfaces, using the Python for Windows extensions, aka pywin32, aka win32all (e.g. simple function, pamie; pywin32 chapter from the O’Reilly book) or ctypes (example). This kind of thing may also come in useful on Windows for cases where the automation API is lacking. For Firefox, there is PyXPCOM.

    Get ambitious and automatically delegate the work to an appropriate interpreter (Mozilla’s JavaScript interpreter, for instance). This is what HtmlUnit and httpunit do. I did a spike along these lines some years ago, but I think it would (still) be quite a lot of work to do well.

    0 讨论(0)
  • 2020-11-28 09:11

    You can use Selenium with Python. You can then scrape JavaScript-generated content as well as manipulate the page with additional JavaScript (as well as Python).

    # In your virtualenv: pip install selenium
    from selenium import webdriver
    
    # Launch Firefox GUI
    browser = webdriver.Firefox()
    
    # Alternatively, you can drive PhantomJS without a GUI
    # With Node.js installed: `npm install -g phantomjs`
    # browser = webdriver.PhantomJS()
    
    # Fetch a webpage
    browser.get('http://example.com')
    
    # If you need the whole HTML document
    # just like inspecting the rendered page with the console
    html = browser.page_source
    
    # Get an element, even if it was created with JS
    button = browser.find_element_by_css_selector('div.some-class > \
                                                   input.the-submit-button')
    
    # Click on something
    button.click()
    
    # Execute some JavaScript (assumes jQuery is loaded on the page)
    browser.execute_script("$('html, body').animate({ scrollTop: 500 }, 50);")
    

    You can run the code in a Python REPL and use autocomplete to discover the methods available on browser or whatever element you have selected. Or do something like print(dir(browser)) to see what is available.

    0 讨论(0)
  • 2020-11-28 09:15

    Basically if you want something that deals with javascript then you need a real javascript engine, these invariably involve automating a real browser (I'm including headless ones in this).

    Java’s HtmlUnit doesn't do a very good job as it doesn't use a javascript engine from an actual browser. Phantom JS sounds ideal (as newz2000 points out) however I find that when manipulating pages with javascript it can be very difficult to debug your script if you can't actually see the page you're dealing with.

    This leads to solutions such as Selenium Webdriver which has a full python API to automate various browsers, however you must run a java jar and it actually launches the browser, so not the pure python solution you're after (but I think this is as close as you can get).

    0 讨论(0)
  • 2020-11-28 09:19

    An example how to use PyV8, to run JS on a DOM with python can be found here:

    https://github.com/buffer/thug

    This should be fairly easy to make it run together with mechanize.

    0 讨论(0)
  • 2020-11-28 09:30

    I've played with this new alternative to Mechanize (which I love) called Phantom JS.

    It is a full web kit browser like Safari or Chrome but is headless and scriptable. You script it with javascript, not python (as far as I know at least).

    There are some example scripts to get you started. It's a lot like using Firebug. I've only spent a few min using it but I found I was quite productive right from the start.

    0 讨论(0)
提交回复
热议问题