urllib

Python interface to PayPal - urllib.urlencode non-ASCII characters failing

蹲街弑〆低调 提交于 2019-12-03 04:22:21
问题 I am trying to implement PayPal IPN functionality. The basic protocol is as such: The client is redirected from my site to PayPal's site to complete payment. He logs into his account, authorizes payment. PayPal calls a page on my server passing in details as POST. Details include a person's name, address, and payment info etc. I need to call a URL on PayPal's site internally from my processing page passing back all the params that were passed in abovem and an additional one called 'cmd' with

urllib

霸气de小男生 提交于 2019-12-03 04:21:08
urllib库实现:从指定的 URL 地址获取网页数据,然后对其进行分析处理,获取想要的数据。 urllib模块urlopen()函数:urlopen(url, data=None, proxies=None) 创建一个表示远程url的类文件对象,然后像本地文件一样操作这个类文件对象来获取远程数据。 参数url表示远程数据的路径,一般是网址。 参数data表示以post方式提交到url的数据(提交数据的两种方式:post与get)。 参数proxies用于设置代理。 urlopen返回 一个类文件对象(fd),它提供了如下方法: read() , readline() , readlines() , fileno() , close() :这些方法的使用方式与文件对象完全一样。 info():返回一个httplib.HTTPMessage 对象,表示远程服务器返回的头信息(header)。 getcode():返回Http状态码。如果是http请求,200表示请求成功完成;404表示网址未找到。 geturl():返回请求的url。 代码示例 import urllib2 doc=urllib2.urlopen("http://www.baidu.com") print doc.geturl() print doc.info() print doc.readline(20)

Python 3 - Add custom headers to urllib.request Request

↘锁芯ラ 提交于 2019-12-03 03:45:10
In Python 3 , the following code obtains the HTML source for a webpage. import urllib.request url = "https://docs.python.org/3.4/howto/urllib2.html" response = urllib.request.urlopen(url) response.read() How can I add the following custom header to the request when using urllib.request? headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' } The request headers can be customized by first creating a request object then supplying it to urlopen. import urllib.request url = "https://docs.python.org/3.4/howto/urllib2.html" hdr = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64;

parse query string with urllib in Python 2.4

谁说我不能喝 提交于 2019-12-03 03:32:01
Using Python2.4.5 (don't ask!) I want to parse a query string and get a dict in return. Do I have to do it "manually" like follows? >>> qs = 'first=1&second=4&third=3' >>> d = dict([x.split("=") for x in qs.split("&")]) >>> d {'second': '4', 'third': '3', 'first': '1'} Didn't find any useful method in urlparse . You have two options: >>> cgi.parse_qs(qs) {'second': ['4'], 'third': ['3'], 'first': ['1']} or >>> cgi.parse_qsl(qs) [('first', '1'), ('second', '4'), ('third', '3')] The values in the dict returned by cgi.parse_qs() are lists rather than strings, in order to handle the case when the

Using Python to sign into website, fill in a form, then sign out

余生长醉 提交于 2019-12-03 03:23:47
问题 As part of my quest to become better at Python I am now attempting to sign in to a website I frequent, send myself a private message, and then sign out. So far, I've managed to sign in (using urllib, cookiejar and urllib2). However, I cannot work out how to fill in the required form to send myself a message. The form is located at /messages.php?action=send. There's three things that need to be filled for the message to send: three text fields named name, title and message. Additionally, there

500 error with urllib.request.urlopen

匿名 (未验证) 提交于 2019-12-03 03:06:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: The following code: req = urllib.request.Request(url=r"http://borel.slu.edu/cgi-bin/cc.cgi?foirm_ionchur=im&foirm=Seol&hits=1&format=xml",headers={'User-Agent':' Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'}) handler = urllib.request.urlopen(req) is giving me the following exception: Traceback (most recent call last): File "C:/Users/Foo/lang/old/test.py", line 46, in <module> rip() File "C:/Users/Foo/lang/old/test.py", line 36, in rip handler = urllib.request.urlopen(req) File "C:\Python32\lib\urllib\request.py",

Python 3.4 urllib.request error (http 403)

匿名 (未验证) 提交于 2019-12-03 02:56:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I'm trying to open and parse a html page. In python 2.7.8 I have no problem: import urllib url = "https://ipdb.at/ip/66.196.116.112" html = urllib.urlopen(url).read() and everything is fine. However I want to move to python 3.4 and there I get HTTP error 403 (Forbidden). My code: import urllib.request html = urllib.request.urlopen(url) # same URL as before File "C:\Python34\lib\urllib\request.py", line 153, in urlopen return opener.open(url, data, timeout) File "C:\Python34\lib\urllib\request.py", line 461, in open response = meth(req,

Python 3.4 SSL error urlopen error EOF occurred in violation of protocol (_ssl.c:600)

匿名 (未验证) 提交于 2019-12-03 02:33:02
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I use Arch Linux, python 3.4, openSSL 1.0.2d. When I make request to https://www.supercash.cz/ I get this error. It doesn't matter if I use requests or build in urllib there is always the same error. SSL certificate for this site seams to be OK in Chrome browser. File "/usr/lib64/python3.4/urllib/request.py", line 463, in open response = self._open(req, data) File "/usr/lib64/python3.4/urllib/request.py", line 481, in _open '_open', req) File "/usr/lib64/python3.4/urllib/request.py", line 441, in _call_chain result = func(*args) File "/usr

第3次作业-MOOC学习笔记:Python网络爬虫与信息提取

偶尔善良 提交于 2019-12-03 02:32:23
1.注册中国大学MOOC 2.选择北京理工大学嵩天老师的《Python网络爬虫与信息提取》MOOC课程 3.学习完成第0周至第4周的课程内容,并完成各周作业 4.提供图片或网站显示的学习进度,证明学习的过程。 5.写一篇不少于1000字的学习笔记,谈一下学习的体会和收获。 网上有很多很多关于这一类东西的教程,我也仅仅是可以实现,并且停留在一知半解的程度,在代码过程中添加了很多对Python的新的理解,对编程这个大集合的更深层的理解。 其实本质上来说爬虫就是一段程序代码。任何程序语言都可以做爬虫,只是繁简程度不同而已。从定义上来说,爬虫就是模拟用户自动浏览并且保存网络数据的程序,当然,大部分的爬虫都是爬取网页信息 爬虫架构:URL管理器,网页下载器,网页解析器 URL管理器:管理待抓取URL集合和已抓取URL集合 防止重复抓取。 URL管理器实现方法: 缓存数据库:大公司,性能高 内存:个人,小公司 关系数据库:永久保存URL数据或节约内存 网页下载器:将URL对应的网页以HTML下载到本地,用于后续分析 常见网页下载器:Python官方基础模块:urllib2 第三方功能包:requests python 3.x中urllib库和urilib2库合并成了urllib库。 其中urllib2.urlopen()变成了urllib.request.urlopen() urllib2

Python 用urllib.urlretrieve()下载文件是的问题

痞子三分冷 提交于 2019-12-03 02:29:41
早就打算做一个爬虫,爬网页上的图片,所以找到了urllib.urlretrieve(url,filename)这个函数,参数url 即是 你要下载文件的地址,而filename 我一度以为是本地的路径,但是尝试之后都报错,原因在于它不是一个路径,而是一个文件名,比如你要下载img.gif 那你必须 存在类似 c:/img.gif 而不能是 c:/,记住是文件名(带全路径),而不是一个路径。ok,我上面说的感觉很不专业,恳请各位 提意见。 来源: oschina 链接: https://my.oschina.net/u/153044/blog/48524