Browser simulation - Python

前端 未结 4 2329
悲哀的现实
悲哀的现实 2021-02-20 11:41

I need to access a few HTML pages through a Python script, problem is that I need COOKIE functionality, therefore a simple urllib HTTP request won\'t work.

Any ideas?

相关标签:
4条回答
  • 2021-02-20 12:07

    Why don't you try Dryscrape for this:

    Import dryscrape as d
    d.start_xvfb()
    Br = d.Session()
    Br.visit('http://URL.COM')
    #open webpage
    Br.at_xpath('//*[@id = "email"]').set('user@enail.com')
    #finding input by id
    Br.at_xpath('//*[@id = "pass"]').set('pasword') 
    Br.at_xpath('//*[@id = "submit_button"]').click()
    #put id of submit button and click it
    

    You don't need cookie lib to store cookies just install Dryscrape and do it in your style

    0 讨论(0)
  • 2021-02-20 12:14

    check out Mechanize. "Stateful programmatic web browsing in Python".
    It handles cookies automagically.

    import mechanize
    
    br = mechanize.Browser()
    resp = br.open("http://www.mysitewithcookies.com/")
    print resp.info()  # headers
    print resp.read()  # content
    

    mechanize also exposes the urllib2 API, with cookie handling enabled by default.

    0 讨论(0)
  • 2021-02-20 12:15

    Here's something that does cookies, and as a bonus does authentication for a site that requires a username and password.

    import urllib2
    import cookielib
    import string
    
    
    
    def cook():
        url="http://wherever"
        cj = cookielib.LWPCookieJar()
        authinfo = urllib2.HTTPBasicAuthHandler()
        realm="realmName"
        username="userName"
        password="passWord"
        host="www.wherever.com"
        authinfo.add_password(realm, host, username, password)
        opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj), authinfo)
        urllib2.install_opener(opener)
    
        # Create request object
        txheaders = { 'User-agent' : "Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)" }
        try:
            req = urllib2.Request(url, None, txheaders)
            cj.add_cookie_header(req)
            f = urllib2.urlopen(req)
    
        except IOError, e:
            print "Failed to open", url
            if hasattr(e, 'code'):
                print "Error code:", e.code
    
        else:
    
            print f
            print f.read()
            print f.info()
            f.close()
            print 'Cookies:'
            for index, cookie in enumerate(cj):
                print index, " : ", cookie      
            cj.save("cookies.lwp")
    
    0 讨论(0)
  • 2021-02-20 12:18

    The cookielib module provides cookie handling for HTTP clients.

    The cookielib module defines classes for automatic handling of HTTP cookies. It is useful for accessing web sites that require small pieces of data – cookies – to be set on the client machine by an HTTP response from a web server, and then returned to the server in later HTTP requests.

    The examples in the doc show how to process cookies in conjunction with urllib:

    import cookielib, urllib2
    cj = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
    r = opener.open("http://example.com/")
    
    0 讨论(0)
提交回复
热议问题