python requests 库教程

梦想的初衷 提交于 2020-03-25 09:57:17

3 月,跳不动了?>>>

入门

  1. 发送请求:
r = requests.get("http://httpbin.org/get")        # GET
r = requests.post("http://httpbin.org/post")      # POST
r = requests.put("http://httpbin.org/put")        # PUT
r = requests.delete("http://httpbin.org/delete")  # DELETE
r = requests.head("http://httpbin.org/get")       # HEAD
r = requests.options("http://httpbin.org/get")    # OPTIONS
  1. URL参数
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get("http://httpbin.org/get", params=payload)

>>> print(r.url)
http://httpbin.org/get?key2=value2&key1=value1
  1. 响应内容
>>> r = requests.get("http://httpbin.org/get")
>>> type(r.text) # 字符串
str
>>> type(r.content) # 字节码
bytes
>>> r.json()    # json格式
>>> r.encoding  # 编码
>>> r.headers   # 头
  1. 定制请求头
>>> import json
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.post("http://httpbin.org/post", data=payload)

>>> url = 'https://api.github.com/some/endpoint'
>>> r = requests.post(url, data=json.dumps(payload))

>>> payload = {'some': 'data'}
>>> headers = {'content-type': 'application/json'}
>>> r = requests.post(url, data=json.dumps(payload), headers=headers)
  1. 状态码
>>> r = requests.get('http://httpbin.org/get')
>>> r.status_code
200
>>> r.ok
True
  1. cookie
>>> url = 'http://httpbin.org/cookies'
>>> cookies = dict(cookies_are='working')

>>> r = requests.get(url, cookies=cookies)
  1. 重定向, 超时

除了HEAD, 自动重定向

>>> r.history
>>> r = requests.head('http://github.com', allow_redirects=True)

>>> requests.get('http://github.com', timeout=0.001) #timeout 仅对连接过程有效,与响应体的下载无关。 timeout 并不是整个下载响应的时间限制
  1. 异常 遇到网络问题(如:DNS查询失败、拒绝连接等)时,Requests会抛出一个 ConnectionError 异常。

遇到罕见的无效HTTP响应时,Requests则会抛出一个 HTTPError 异常。

若请求超时,则抛出一个 Timeout 异常。

若请求超过了设定的最大重定向次数,则会抛出一个 TooManyRedirects 异常。

所有Requests显式抛出的异常都继承自 requests.exceptions.RequestException

进阶

  1. 会话
s = requests.Session()
s.auth = ('user', 'pass')
s.headers.update({'x-test': 'true'})

#'x-test' 和 'x-test2' 都会被发送
s.get('http://httpbin.org/headers', headers={'x-test2': 'true'})

任何你传递给请求方法的字典都会与已设置会话层数据合并。方法层的参数覆盖会话的参数。

  1. 定制请求
from requests import Request, Session

s = Session()
req = Request('GET', url,
    data=data,
    headers=header
)
prepped = req.prepare()

#do something with prepped.body
#do something with prepped.headers

resp = s.send(prepped,
    stream=stream,
    verify=verify,
    proxies=proxies,
    cert=cert,
    timeout=timeout
)

定制会话请求, 比如带cookie:

from requests import Request, Session

s = Session()
req = Request('GET',  url,
    data=data
    headers=headers
)

prepped = s.prepare_request(req)

#do something with prepped.body
#do something with prepped.headers

resp = s.send(prepped,
    stream=stream,
    verify=verify,
    proxies=proxies,
    cert=cert,
    timeout=timeout
)
  1. SSL证书验证
>>> requests.get('https://github.com', verify=True)
  1. 响应体内容工作流

默认情况下,当你进行网络请求后,响应体会立即被下载。你可以通过 stream 参数覆盖这个行为,推迟下载响应体直到访问 Response.content 属性:

tarball_url = 'https://github.com/kennethreitz/requests/tarball/master'
r = requests.get(tarball_url, stream=True)

此时仅有响应头被下载下来了,连接保持打开状态,因此允许我们根据条件获取内容:

if int(r.headers['content-length']) < TOO_LONG:
  content = r.content
  ...

你可以进一步使用 Response.iter_content 和 Response.iter_lines 方法来控制工作流,或者以 Response.raw 从底层urllib3的 urllib3.HTTPResponse <urllib3.response.HTTPResponse 读取

import json
import requests

r = requests.get('http://httpbin.org/stream/20', stream=True)

for line in r.iter_lines():

    \# filter out keep-alive new lines
    if line:
        print(json.loads(line))

连接只有在响应体被完全读取后才会被释放, 若部分读取然后释放连接,可用上下文管理:

from contextlib import closing

with closing(requests.get('http://httpbin.org/get', stream=True)) as r:
    \# Do things with the response here.

  1. 事件挂钩

callback_function 会接受一个数据块作为它的第一个参数

def print_url(r):
    print(r.url)
>>> requests.get('http://httpbin.org', hooks=dict(response=print_url))
http://httpbin.org
<Response [200]>
  1. 代理
import requests

proxies = {
  "http": "http://10.10.1.10:3128",
  "https": "http://10.10.1.10:1080",
}

requests.get("http://example.org", proxies=proxies)

使用系统代理

$ export HTTP_PROXY="http://10.10.1.10:3128"
$ export HTTPS_PROXY="http://10.10.1.10:1080"
$ python
>>> import requests
>>> requests.get("http://example.org")

若你的代理需要使用HTTP Basic Auth,可以使用 http://user:password@host/ 语法:

proxies = {
    "http": "http://user:pass@10.10.1.10:3128/",
}
  1. 身份认证

Requests简化了多种身份验证形式的使用, 包括非常常见的Basic Auth

>>> from requests.auth import HTTPBasicAuth
>>> requests.get('https://api.github.com/user', auth=HTTPBasicAuth('user', 'pass'))
<Response [200]>

Requests提供了一种简写的使用方式:

>>> requests.get('https://api.github.com/user', auth=('user', 'pass'))
<Response [200]>

摘要式身份认证:

>>> from requests.auth import HTTPDigestAuth
>>> url = 'http://httpbin.org/digest-auth/auth/user/pass'
>>> requests.get(url, auth=HTTPDigestAuth('user', 'pass'))
<Response [200]>
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!