urllib

Python interface to PayPal - urllib.urlencode non-ASCII characters failing

阅读更多关于 Python interface to PayPal - urllib.urlencode non-ASCII characters failing

问题 I am trying to implement PayPal IPN functionality. The basic protocol is as such: The client is redirected from my site to PayPal's site to complete payment. He logs into his account, authorizes payment. PayPal calls a page on my server passing in details as POST. Details include a person's name, address, and payment info etc. I need to call a URL on PayPal's site internally from my processing page passing back all the params that were passed in abovem and an additional one called 'cmd' with

阅读更多关于 urllib

urllib库实现：从指定的 URL 地址获取网页数据，然后对其进行分析处理，获取想要的数据。 urllib模块urlopen()函数：urlopen(url, data=None, proxies=None) 创建一个表示远程url的类文件对象，然后像本地文件一样操作这个类文件对象来获取远程数据。参数url表示远程数据的路径，一般是网址。参数data表示以post方式提交到url的数据(提交数据的两种方式：post与get)。参数proxies用于设置代理。 urlopen返回一个类文件对象(fd)，它提供了如下方法： read() , readline() , readlines() , fileno() , close() ：这些方法的使用方式与文件对象完全一样。 info()：返回一个httplib.HTTPMessage 对象，表示远程服务器返回的头信息(header)。 getcode()：返回Http状态码。如果是http请求，200表示请求成功完成;404表示网址未找到。 geturl()：返回请求的url。代码示例 import urllib2 doc=urllib2.urlopen("http://www.baidu.com") print doc.geturl() print doc.info() print doc.readline(20)

Python 3 - Add custom headers to urllib.request Request

阅读更多关于 Python 3 - Add custom headers to urllib.request Request

In Python 3 , the following code obtains the HTML source for a webpage. import urllib.request url = "https://docs.python.org/3.4/howto/urllib2.html" response = urllib.request.urlopen(url) response.read() How can I add the following custom header to the request when using urllib.request? headers = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' } The request headers can be customized by first creating a request object then supplying it to urlopen. import urllib.request url = "https://docs.python.org/3.4/howto/urllib2.html" hdr = { 'User-Agent' : 'Mozilla/5.0 (Windows NT 6.1; Win64;

parse query string with urllib in Python 2.4

阅读更多关于 parse query string with urllib in Python 2.4

Using Python2.4.5 (don't ask!) I want to parse a query string and get a dict in return. Do I have to do it "manually" like follows? >>> qs = 'first=1&second=4&third=3' >>> d = dict([x.split("=") for x in qs.split("&")]) >>> d {'second': '4', 'third': '3', 'first': '1'} Didn't find any useful method in urlparse . You have two options: >>> cgi.parse_qs(qs) {'second': ['4'], 'third': ['3'], 'first': ['1']} or >>> cgi.parse_qsl(qs) [('first', '1'), ('second', '4'), ('third', '3')] The values in the dict returned by cgi.parse_qs() are lists rather than strings, in order to handle the case when the

Using Python to sign into website, fill in a form, then sign out

阅读更多关于 Using Python to sign into website, fill in a form, then sign out

问题 As part of my quest to become better at Python I am now attempting to sign in to a website I frequent, send myself a private message, and then sign out. So far, I've managed to sign in (using urllib, cookiejar and urllib2). However, I cannot work out how to fill in the required form to send myself a message. The form is located at /messages.php?action=send. There's three things that need to be filled for the message to send: three text fields named name, title and message. Additionally, there

500 error with urllib.request.urlopen

阅读更多关于 500 error with urllib.request.urlopen

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: The following code: req = urllib.request.Request(url=r"http://borel.slu.edu/cgi-bin/cc.cgi?foirm_ionchur=im&foirm=Seol&hits=1&format=xml",headers={'User-Agent':' Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'}) handler = urllib.request.urlopen(req) is giving me the following exception: Traceback (most recent call last): File "C:/Users/Foo/lang/old/test.py", line 46, in <module> rip() File "C:/Users/Foo/lang/old/test.py", line 36, in rip handler = urllib.request.urlopen(req) File "C:\Python32\lib\urllib\request.py",

Python 3.4 urllib.request error (http 403)

阅读更多关于 Python 3.4 urllib.request error (http 403)

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I'm trying to open and parse a html page. In python 2.7.8 I have no problem: import urllib url = "https://ipdb.at/ip/66.196.116.112" html = urllib.urlopen(url).read() and everything is fine. However I want to move to python 3.4 and there I get HTTP error 403 (Forbidden). My code: import urllib.request html = urllib.request.urlopen(url) # same URL as before File "C:\Python34\lib\urllib\request.py", line 153, in urlopen return opener.open(url, data, timeout) File "C:\Python34\lib\urllib\request.py", line 461, in open response = meth(req,

Python 3.4 SSL error urlopen error EOF occurred in violation of protocol (_ssl.c:600)

阅读更多关于 Python 3.4 SSL error urlopen error EOF occurred in violation of protocol (_ssl.c:600)

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试): 问题: I use Arch Linux, python 3.4, openSSL 1.0.2d. When I make request to https://www.supercash.cz/ I get this error. It doesn't matter if I use requests or build in urllib there is always the same error. SSL certificate for this site seams to be OK in Chrome browser. File "/usr/lib64/python3.4/urllib/request.py", line 463, in open response = self._open(req, data) File "/usr/lib64/python3.4/urllib/request.py", line 481, in _open '_open', req) File "/usr/lib64/python3.4/urllib/request.py", line 441, in _call_chain result = func(*args) File "/usr

第3次作业-MOOC学习笔记：Python网络爬虫与信息提取

阅读更多关于第3次作业-MOOC学习笔记：Python网络爬虫与信息提取

1.注册中国大学MOOC 2.选择北京理工大学嵩天老师的《Python网络爬虫与信息提取》MOOC课程 3.学习完成第0周至第4周的课程内容，并完成各周作业 4.提供图片或网站显示的学习进度，证明学习的过程。 5.写一篇不少于1000字的学习笔记，谈一下学习的体会和收获。网上有很多很多关于这一类东西的教程，我也仅仅是可以实现，并且停留在一知半解的程度，在代码过程中添加了很多对Python的新的理解，对编程这个大集合的更深层的理解。其实本质上来说爬虫就是一段程序代码。任何程序语言都可以做爬虫，只是繁简程度不同而已。从定义上来说，爬虫就是模拟用户自动浏览并且保存网络数据的程序，当然，大部分的爬虫都是爬取网页信息爬虫架构：URL管理器，网页下载器，网页解析器 URL管理器：管理待抓取URL集合和已抓取URL集合防止重复抓取。 URL管理器实现方法：缓存数据库：大公司，性能高内存：个人，小公司关系数据库：永久保存URL数据或节约内存网页下载器：将URL对应的网页以HTML下载到本地，用于后续分析常见网页下载器：Python官方基础模块：urllib2 第三方功能包：requests python 3.x中urllib库和urilib2库合并成了urllib库。其中urllib2.urlopen()变成了urllib.request.urlopen() urllib2

Python 用urllib.urlretrieve()下载文件是的问题

阅读更多关于 Python 用urllib.urlretrieve()下载文件是的问题

早就打算做一个爬虫，爬网页上的图片，所以找到了urllib.urlretrieve(url,filename)这个函数，参数url 即是你要下载文件的地址，而filename 我一度以为是本地的路径，但是尝试之后都报错，原因在于它不是一个路径，而是一个文件名，比如你要下载img.gif 那你必须存在类似 c:/img.gif 而不能是 c:/,记住是文件名(带全路径)，而不是一个路径。ok，我上面说的感觉很不专业，恳请各位提意见。来源： oschina 链接： https://my.oschina.net/u/153044/blog/48524

订阅 urllib