Getting HTML with Pycurl

后端 未结 2 1243
梦毁少年i
梦毁少年i 2021-02-06 03:52

I\'ve been trying to retrieve a page of HTML using pycurl, so I can then parse it for relevant information using str.split and some for loops. I know Pycurl retrieves the HTML,

相关标签:
2条回答
  • 2021-02-06 04:27

    The perform() method executes the html fetch and writes the result to a function you specify. You need to provide a buffer to put the html into and a write function. Usually, this can be accomplished using a StringIO object as follows:

    import pycurl
    import StringIO
    
    c = pycurl.Curl()
    c.setopt(pycurl.URL, "http://www.google.com/")
    
    b = StringIO.StringIO()
    c.setopt(pycurl.WRITEFUNCTION, b.write)
    c.setopt(pycurl.FOLLOWLOCATION, 1)
    c.setopt(pycurl.MAXREDIRS, 5)
    c.perform()
    html = b.getvalue()
    

    You could also use a file or tempfile or anything else that can store data.

    0 讨论(0)
  • 2021-02-06 04:35

    this will send a request and store/print the response body:

    from StringIO import StringIO    
    import pycurl
    
    url = 'http://www.google.com/'
    
    storage = StringIO()
    c = pycurl.Curl()
    c.setopt(c.URL, url)
    c.setopt(c.WRITEFUNCTION, storage.write)
    c.perform()
    c.close()
    content = storage.getvalue()
    print content
    

    if you want to store the response headers, use:

    c.setopt(c.HEADERFUNCTION, storage.write)
    
    0 讨论(0)
提交回复
热议问题