urllib | 易学教程

Obnoxious CryptographyDeprecationWarning because of missing hmac.compare_time function everywhere

阅读更多关于 Obnoxious CryptographyDeprecationWarning because of missing hmac.compare_time function everywhere

问题 Things were running along fine until one of my projects started printing this everywhere, at the top of every execution, at least once: local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. I have no idea why it started and it's disrupting the

Python爬虫1-----urllib模块

阅读更多关于 Python爬虫1-----urllib模块

1、加载urllib模块的request from urllib import request 2、相关函数：（1）urlopen函数：读取网页 webpage= request.urlopen (url，timeout=1) 【读取网页,参数timeout表示1秒之后为超时，遇到无效网页时可以跳过】 data=webpage. read() 【读取页面内容】　　【使用webpage.read()读取的页面内容text内容为bytes-object，打印内容为b’……‘】 data=data. decode ('utf-8') 【解码】　　【text为bytes-object，将其转换为字符串text.decode()，默认参数为空，也可使用编码方式参数，格式为decode(“gb2312”)。】 pat='<div class="name">(.*?)</div>' 　　res=re.compile(pat).findall( str(data) )【记得str(data)】　　【无法直接使用到re.search()，使用前需要转换为string类型。res就是获取的内容】（2）urlretrieve函数：读取网页并可以保存在本地,成为本地网页 urllib.request.urlretrieve( url , filename=" 本地文件地址//1.html" )

temporarily retrieve an image using the requests library

阅读更多关于 temporarily retrieve an image using the requests library

问题 I'm writing a web scraper than needs to scrape only the thumbnail of an image from the url. This is my function using, the urlib library. def create_thumb(self): if self.url and not self.thumbnail: image = urllib.request.urlretrieve(self.url) # Create the thumbnail of dimension size size = 350, 350 t_img = Imagelib.open(image[0]) t_img.thumbnail(size) # Get the directory name where the temp image was stored # by urlretrieve dir_name = os.path.dirname(image[0]) # Get the image name from the

~urllib库的使用

阅读更多关于 ~urllib库的使用

urllib库：用来处理网络请求的python标准库主要包含4个模块： urllib.requests：请求模块，用来发起网络请求 urllib.parse：解析模块，用来解析url urllib.error：异常处理模块，用来处理requests引起的异常情况 urllib.robotparse：用来解析robots.txt文件 1.requests模块主要负责构造和发起网络请求，并在其中添加headers、proxy等。利用requests模块可以模拟浏览器的请求发起过程。 1.1 请求方法urlopen "urlopen是一个简单的发送网络请求的方法，timeout设置超时，如果请求超过设置时间，则抛出异常" response = request.urlopen(url='http://www.baidu.com/get', timeout=0.1) "urlopen默认是发送get请求，当传入data参数时，则会发送post请求" response = request.urlopen(url='http://www.baidu.com/post',data=b'username=admin&password=123456') 1.2 添加请求头 "通过urllib发送的请求会有一个默认的Headers: “User-Agent”:“Python-urllib/3.6”

urllib downloading contents of an online directory

阅读更多关于 urllib downloading contents of an online directory

问题 I'm trying to make a program that will open a directory, then use regular expressions to get the names of powerpoints and then create files locally and copy their content. When I run this it appears to work, however when I actually try to open the files they keep saying the version is wrong. from urllib.request import urlopen import re urlpath = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/') string = urlpath.read().decode('utf-8') pattern = re.compile('ch

urllib downloading contents of an online directory

阅读更多关于 urllib downloading contents of an online directory

爬虫基础之 urllib

阅读更多关于爬虫基础之 urllib

一、urllib 1. 访问 urllib.request.urlopen() 参数： url：需要爬取的URL地址 timeout：设置等待时间，指定时间内未得到相应时抛出异常 # 导入模块 import urllib.request url = "http://www.baidu.com/" # 向百度发起请求，得到相应对象 html = urllib.request.urlopen(url) print(html.read().decode("utf-8")) # 得到网页源代码，为str类型 print(html.status) # 得到响应状态码 2.响应方法 1、bytes = response.read() # 获取原生的网页源码 2、string = response.read().decode("utf-8") # 获取网页源码并转码 3、url = response.geturl() # 获取资源的URL 4、code = response.getcode() # 获取响应状态码 5、string.encode() # string --> bytes 6、bytes.decode() # bytes --> string 3. 包装 3.1 User-Agent urllib.request.Request() 作用：创建请求对象（包装请求，重构User

urllib.py doesn't work with https?

阅读更多关于 urllib.py doesn't work with https?

问题 In my python app I try to open a https url, but I get: File "C:\Python26\lib\urllib.py", line 215, in open_unknown raise IOError, ('url error', 'unknown url type', type) IOError: [Errno url error] unknown url type: 'https' my code: import urllib def generate_embedded_doc(doc_id): url = "https://docs.google.com/document/ub?id=" + doc_id + "&embedded=true" src = urllib.urlopen(url).read() ... return src 回答1: urllib and Python 2.6 have SSL support and your code example works fine for me.

urllib.py doesn't work with https?

阅读更多关于 urllib.py doesn't work with https?

Python 3 - TypeError: a bytes-like object is required, not 'str'

阅读更多关于 Python 3 - TypeError: a bytes-like object is required, not 'str'

问题 I'm working on a lesson from Udacity and am having some issue trying to find out if the result from this site returns true or false. I get the TypeError with the code below. from urllib.request import urlopen #check text for curse words def check_profanity(): f = urlopen("http://www.wdylike.appspot.com/?q=shit") output = f.read() f.close() print(output) if "b'true'" in output: print("There is a profane word in the document") check_profanity() The output prints b'true' and I'm not really sure