urllib

Obnoxious CryptographyDeprecationWarning because of missing hmac.compare_time function everywhere

瘦欲@ 提交于 2019-12-19 07:20:27
问题 Things were running along fine until one of my projects started printing this everywhere, at the top of every execution, at least once: local/lib/python2.7/site-packages/cryptography/hazmat/primitives/constant_time.py:26: CryptographyDeprecationWarning: Support for your Python version is deprecated. The next version of cryptography will remove support. Please upgrade to a 2.7.x release that supports hmac.compare_digest as soon as possible. I have no idea why it started and it's disrupting the

Python爬虫1-----urllib模块

醉酒当歌 提交于 2019-12-19 04:56:14
1、加载urllib模块的request from urllib import request 2、相关函数: (1)urlopen函数:读取网页 webpage= request.urlopen (url,timeout=1) 【读取网页,参数timeout表示1秒之后为超时,遇到无效网页时可以跳过】 data=webpage. read() 【读取页面内容】   【使用webpage.read()读取的页面内容text内容为bytes-object,打印内容为b’……‘】 data=data. decode ('utf-8') 【解码】   【text为bytes-object,将其转换为字符串text.decode(),默认参数为空,也可使用编码方式参数,格式为decode(“gb2312”)。】 pat='<div class="name">(.*?)</div>'   res=re.compile(pat).findall( str(data) )【记得str(data)】   【无法直接使用到re.search(),使用前需要转换为string类型。res就是获取的内容】 (2)urlretrieve函数:读取网页并可以保存在本地,成为本地网页 urllib.request.urlretrieve( url , filename=" 本地文件地址//1.html" )

temporarily retrieve an image using the requests library

…衆ロ難τιáo~ 提交于 2019-12-19 03:56:08
问题 I'm writing a web scraper than needs to scrape only the thumbnail of an image from the url. This is my function using, the urlib library. def create_thumb(self): if self.url and not self.thumbnail: image = urllib.request.urlretrieve(self.url) # Create the thumbnail of dimension size size = 350, 350 t_img = Imagelib.open(image[0]) t_img.thumbnail(size) # Get the directory name where the temp image was stored # by urlretrieve dir_name = os.path.dirname(image[0]) # Get the image name from the

~urllib库的使用

微笑、不失礼 提交于 2019-12-19 03:22:33
urllib库:用来处理网络请求的python标准库 主要包含4个模块: urllib.requests:请求模块,用来发起网络请求 urllib.parse:解析模块,用来解析url urllib.error:异常处理模块,用来处理requests引起的异常情况 urllib.robotparse:用来解析robots.txt文件 1.requests模块 主要负责构造和发起网络请求,并在其中添加headers、proxy等。利用requests模块可以模拟浏览器的请求发起过程。 1.1 请求方法urlopen "urlopen是一个简单的发送网络请求的方法,timeout设置超时,如果请求超过设置时间,则抛出异常" response = request.urlopen(url='http://www.baidu.com/get', timeout=0.1) "urlopen默认是发送get请求,当传入data参数时,则会发送post请求" response = request.urlopen(url='http://www.baidu.com/post',data=b'username=admin&password=123456') 1.2 添加请求头 "通过urllib发送的请求会有一个默认的Headers: “User-Agent”:“Python-urllib/3.6”

urllib downloading contents of an online directory

社会主义新天地 提交于 2019-12-19 02:28:20
问题 I'm trying to make a program that will open a directory, then use regular expressions to get the names of powerpoints and then create files locally and copy their content. When I run this it appears to work, however when I actually try to open the files they keep saying the version is wrong. from urllib.request import urlopen import re urlpath = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/') string = urlpath.read().decode('utf-8') pattern = re.compile('ch

urllib downloading contents of an online directory

不羁的心 提交于 2019-12-19 02:28:05
问题 I'm trying to make a program that will open a directory, then use regular expressions to get the names of powerpoints and then create files locally and copy their content. When I run this it appears to work, however when I actually try to open the files they keep saying the version is wrong. from urllib.request import urlopen import re urlpath = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/') string = urlpath.read().decode('utf-8') pattern = re.compile('ch

爬虫基础 之 urllib

我与影子孤独终老i 提交于 2019-12-19 01:20:46
一、urllib 1. 访问 urllib.request.urlopen() 参数: url:需要爬取的URL地址 timeout:设置等待时间,指定时间内未得到相应时抛出异常 # 导入模块 import urllib.request url = "http://www.baidu.com/" # 向百度发起请求,得到相应对象 html = urllib.request.urlopen(url) print(html.read().decode("utf-8")) # 得到网页源代码,为str类型 print(html.status) # 得到响应状态码 2.响应方法 1、bytes = response.read() # 获取原生的网页源码 2、string = response.read().decode("utf-8") # 获取网页源码并转码 3、url = response.geturl() # 获取资源的URL 4、code = response.getcode() # 获取响应状态码 5、string.encode() # string --> bytes 6、bytes.decode() # bytes --> string 3. 包装 3.1 User-Agent urllib.request.Request() 作用:创建请求对象(包装请求,重构User

urllib.py doesn't work with https?

混江龙づ霸主 提交于 2019-12-18 17:12:44
问题 In my python app I try to open a https url, but I get: File "C:\Python26\lib\urllib.py", line 215, in open_unknown raise IOError, ('url error', 'unknown url type', type) IOError: [Errno url error] unknown url type: 'https' my code: import urllib def generate_embedded_doc(doc_id): url = "https://docs.google.com/document/ub?id=" + doc_id + "&embedded=true" src = urllib.urlopen(url).read() ... return src 回答1: urllib and Python 2.6 have SSL support and your code example works fine for me.

urllib.py doesn't work with https?

北城以北 提交于 2019-12-18 17:12:31
问题 In my python app I try to open a https url, but I get: File "C:\Python26\lib\urllib.py", line 215, in open_unknown raise IOError, ('url error', 'unknown url type', type) IOError: [Errno url error] unknown url type: 'https' my code: import urllib def generate_embedded_doc(doc_id): url = "https://docs.google.com/document/ub?id=" + doc_id + "&embedded=true" src = urllib.urlopen(url).read() ... return src 回答1: urllib and Python 2.6 have SSL support and your code example works fine for me.

Python 3 - TypeError: a bytes-like object is required, not 'str'

走远了吗. 提交于 2019-12-18 16:53:42
问题 I'm working on a lesson from Udacity and am having some issue trying to find out if the result from this site returns true or false. I get the TypeError with the code below. from urllib.request import urlopen #check text for curse words def check_profanity(): f = urlopen("http://www.wdylike.appspot.com/?q=shit") output = f.read() f.close() print(output) if "b'true'" in output: print("There is a profane word in the document") check_profanity() The output prints b'true' and I'm not really sure