urllib

Fetching Image from URL using BeautifulSoup

 ̄綄美尐妖づ 提交于 2019-12-04 06:05:47
问题 I am trying to fetch important images and not thumbnail or other gifs from the Wikipedia page and using following code. However the "img" is coming as length of "0". any suggestion on how to rectify it. Code : import urllib import urllib2 from bs4 import BeautifulSoup import os html = urllib2.urlopen("http://en.wikipedia.org/wiki/Main_Page") soup = BeautifulSoup(html) imgs = soup.findAll("div",{"class":"image"}) Also if someone can explain in detail that how to use the findAll by looking at

python中的urlencode与urldecode

两盒软妹~` 提交于 2019-12-04 05:48:29
当url地址含有中文,或者参数有中文的时候,这个算是很正常了,但是把这样的url作为参数传递的时候(最常见的callback),需要把一些中文甚至'/'做一下编码转换。 一、urlencode urllib库里面有个urlencode函数,可以把key-value这样的键值对转换成我们想要的格式,返回的是a=1&b=2这样的字符串,比如: >>> from urllib import urlencode >>> data = { ... 'a': 'test', ... 'name': '魔兽' ... } >>> print urlencode(data) a=test&name=%C4%A7%CA%DE 如果只想对一个字符串进行urlencode转换,怎么办?urllib提供另外一个函数:quote() >>> from urllib import quote >>> quote('魔兽') '%C4%A7%CA%DE' 二、urldecode 当urlencode之后的字符串传递过来之后,接受完毕就要解码了——urldecode。urllib提供了unquote()这个函数,可没有urldecode()! >>> from urllib import unquote >>> unquote('%C4%A7%CA%DE') '\xc4\xa7\xca\xde' >>>

In Python 3.2, I can open and read an HTTPS web page with http.client, but urllib.request is failing to open the same page

三世轮回 提交于 2019-12-04 04:06:54
I want to open and read https://yande.re/ with urllib.request , but I'm getting an SSL error. I can open and read the page just fine using http.client with this code: import http.client conn = http.client.HTTPSConnection('www.yande.re') conn.request('GET', 'https://yande.re/') resp = conn.getresponse() data = resp.read() However, the following code using urllib.request fails: import urllib.request opener = urllib.request.build_opener() resp = opener.open('https://yande.re/') data = resp.read() It gives me the following error: ssl.SSLError: [Errno 1] _ssl.c:392: error:1411809D:SSL routines:SSL

urllib2 basic authentication oddites

爱⌒轻易说出口 提交于 2019-12-04 03:55:01
I'm slamming my head against the wall with this one. I've been trying every example, reading every last bit I can find online about basic http authorization with urllib2, but I can not figure out what is causing my specific error. Adding to the frustration is that the code works for one page, and yet not for another. logging into www.mysite.com/adm goes absolutely smooth. It authenticates no problem. Yet if I change the address to 'http://mysite.com/adm/items.php?n=201105&c=200' I receive this error: <h4 align="center" class="teal">Add/Edit Items</h4> <p><strong>Client:</strong> </p><p><strong

How do I download a file using urllib.request in Python 3?

你。 提交于 2019-12-04 03:05:32
So, I'm messing around with urllib.request in Python 3 and am wondering how to write the result of getting an internet file to a file on the local machine. I tried this: g = urllib.request.urlopen('http://media-mcw.cursecdn.com/3/3f/Beta.png') with open('test.png', 'b+w') as f: f.write(g) But I got this error: TypeError: 'HTTPResponse' does not support the buffer interface What am I doing wrong? NOTE: I have seen this question , but it's related to Python 2's urllib2 which was overhauled in Python 3. change f.write(g) to f.write(g.read()) An easier way I think (also you can do it in two lines)

I/O error(socket error): [Errno 111] Connection refused

a 夏天 提交于 2019-12-04 02:13:22
I have a program that uses urllib to periodically fetch a url, and I see intermittent errors like : I/O error(socket error): [Errno 111] Connection refused. It works 90% of the time, but the othe r10% it fails. If retry the fetch immediately after it fails, it succeeds. I'm unable to figure out why this is so. I tried to see if any ports are available, and they are. Any debugging ideas? For additional info, the stack trace is: File "/usr/lib/python2.6/urllib.py", line 203, in open return getattr(self, name)(url) File "/usr/lib/python2.6/urllib.py", line 342, in open_http h.endheaders() File "

11 python学习笔记-网络编程

夙愿已清 提交于 2019-12-04 01:30:35
  python操作网络,即打开一个网站,或者请求一个http接口,可以通过使用python自带的标准模块urllib或第三方库requests实现 一、使用urllib模块操作网络   urllib模块是一个标准模块,直接import urllib即可,在python3里面只有urllib模块,在python2里面有urllib模块和urllib2模块。使用urlib模块发送请求实例如下: 1 from urllib import request 2 from urllib import parse 3 import json 4 5 #1、发送get请求 6 url = 'http://api.xxxx.cn/api/user/stu_info' 7 data = {'stu_name':'xiaohei'} 8 tmpData = parse.urlencode(data) #将数据格式变成Kv k=v 9 tmpUrl=url +'?'+tmpData #将接口url和参数拼接 10 res = request.urlopen(tmpUrl) #请求接口 11 resForRead=res.read() #通过read方法获取返回值结果,返回值结果是bytes 12 print(type(resForRead)) 13 resForString= resForRead

python_接口

假装没事ソ 提交于 2019-12-03 22:27:41
一、urllib模块 urllib模块是一个标准模块,直接import urllib即可,在python3里面只有urllib模块,在python2里面有urllib模块和urllib2模块。 urllib模块太麻烦了,传参数的话,都得是bytes类型,返回数据也是bytes类型,还得解码,想直接把返回结果拿出来使用的话,还得用json,发get请求和post请求,也不通,使用比较麻烦 1 import json 2 from urllib import request 3 from urllib import parse 4 5 #【get请求】 6 url = 'http://api.nnzhp.cn/api/user/stu_info' 7 8 data={"stu_name":"xiaohei"} 9 10 tmpData=parse.urlencode(data) #1、将数据变为k=v模式 11 print(tmpData) 12 # 接口+参数 13 tmpUrl=url+'?'+tmpData # 接口参数拼接 14 print(tmpUrl) 15 res = request.urlopen(tmpUrl) # 请求接口 16 resForRead = res.read() # 通过read安啊获取返回值结果,返回值结果为Bytes类型 17 print(res

Python 2.7.10 error “from urllib.request import urlopen” no module named request

元气小坏坏 提交于 2019-12-03 22:22:15
I opened python code from github . I assumed it was python2.x and got the above error when I tried to run it. From the reading I've seen Python 3 has depreciated urllib itself and replaced it with a number of libraries including urllib.request . It looks like the code was written in python 3 (a confirmation from someone who knows would be appreciated.) At this point I don't want to move to Python 3 - I haven't researched what it would do to my existing code. Thinking there should be a urllib module for Python 2 , I searched Google (using "python2 urllib download") and did not find one. (It

Python-网络编程

时光总嘲笑我的痴心妄想 提交于 2019-12-03 21:07:25
一、使用python自带模块urllib 模拟页面请求服务端,python提供了一个urllib模块,作用是通过python代码调用接口进行参数传递并获取到接口的返回值信息 urllib模式是一个标准模块,直接import urllib即可 1、发送get请求 from urllib import request from urllib import parse import json #get请求 # url='http://www.nzhp.cn/api/user/stu_info' # # data={'stu_name':'xiaohei'} #传递参数 # # tmpData=parse.urlencode(data) #将数据变成kv形式,即k=v # #接口+参数 # tmpUrl=url+'?'+tmpData # 将接口和参数拼接 # res=request.urlopen(tmpUrl) #请求接口 # resForRead=res.read() #通过read方法获取返回值结果,返回值结果是二进制bytes的类型 # resForString=resForRead.decode() #通过decode将bytes 转换成str类型,反过来是用encode # resForDict=json.loads(resForString)