Getting HTML with Pycurl

后端未结

关注

 2  1247

I\'ve been trying to retrieve a page of HTML using pycurl, so I can then parse it for relevant information using str.split and some for loops. I know Pycurl retrieves the HTML,

相关标签:

2条回答

伪装坚强ぢ

2021-02-06 04:27
The perform() method executes the html fetch and writes the result to a function you specify. You need to provide a buffer to put the html into and a write function. Usually, this can be accomplished using a StringIO object as follows:
```
import pycurl
import StringIO

c = pycurl.Curl()
c.setopt(pycurl.URL, "http://www.google.com/")

b = StringIO.StringIO()
c.setopt(pycurl.WRITEFUNCTION, b.write)
c.setopt(pycurl.FOLLOWLOCATION, 1)
c.setopt(pycurl.MAXREDIRS, 5)
c.perform()
html = b.getvalue()
```
You could also use a file or tempfile or anything else that can store data.
0 讨论(0)
发布评论:

提交评论
- 加载中...

半阙折子戏

2021-02-06 04:35

this will send a request and store/print the response body:

from StringIO import StringIO    
import pycurl

url = 'http://www.google.com/'

storage = StringIO()
c = pycurl.Curl()
c.setopt(c.URL, url)
c.setopt(c.WRITEFUNCTION, storage.write)
c.perform()
c.close()
content = storage.getvalue()
print content

if you want to store the response headers, use:

c.setopt(c.HEADERFUNCTION, storage.write)

0 讨论(0)