urllib

Download from EXPLOSM.net Comics Script [Python]

岁酱吖の 提交于 2019-12-07 00:29:58
So I wrote this short script (correct word?) to download the comic images from explosm.net comics because I somewhat-recently found out about it and I want to...put it on my iPhone...3G. It works fine and all. urllib2 for getting webpage html and urllib for image.retrieve() Why I posted this on SO: how do I optimize this code? Would REGEX (regular expressions) make it faster? Is it an internet limitation? Poor algorithm...? Any improvements in speed or general code aesthetics would be greatly appreciated "answers". Thank you. --------------------------------CODE--------------------------------

python requests return a different web page from browser or urllib

不羁岁月 提交于 2019-12-07 00:12:31
I use requests to scrape webpage for some content. When I use import requests requests.get('example.org') I get a different page from the one I get when I use my broswer or using import urllib.request urllib.request.urlopen('example.org') I tried using urllib but it was really slow. In a comparison test I did it was 50% slower than requests !! How Do you solve this?? After a lot of investigations I found that the site passes a cookie in the header attached to the first visitor to the site only. so the solution is to get the cookies with head request, then resend them with your get request

How to make api call that requires login in web2py?

随声附和 提交于 2019-12-06 22:48:37
I want to access APIs from application. Those APIs has decorator @auth.requires_login() . I am calling api from controller using demo_app/controllers/plugin_task/task url = request.env.http_origin + URL('api', 'bind_task') page = urllib2.Request(url) page.add_header('cookie', request.env.http_cookie) response = urllib2.urlopen(page) Demo API api.py @auth.requires_login() @request.restful() def bind_task(): response.view = 'generic.json' return dict(GET=_bind_task) def _bind_task(**get_params): return json.dumps({'status': '200'}) Above code gives me error : HTTPError: HTTP Error 401:

How to use Python to retrieve xml page that requires http login?

冷暖自知 提交于 2019-12-06 21:49:04
When I access a page on an IIS server to retrieve xml, using a query parameter through the browser (using the http in the below example) I get a pop-up login dialog for username and password (appears to be a system standard dialog/form). and once submitted the data arrives. as an xml page. How do I handle this with urllib? when I do the following, I never get prompted for a uid/psw.. I just get a traceback indicating the server (correctly ) id's me as not authorized. Using python 2.7 in Ipython notebook f = urllib.urlopen("http://www.nalmls.com/SERetsHuntsville/Search.aspx?SearchType=Property

AttributeError: module 'urllib' has no attribute 'parse'

妖精的绣舞 提交于 2019-12-06 18:17:34
问题 python 3.5.2 code 1 import urllib s = urllib.parse.quote('"') print(s) it gave this error: AttributeError: module 'urllib' has no attribute 'parse' code 2 from urllib.parse import quote # import urllib # s = urllib.parse.quote('"') s = quote('"') print(s) it works... code3 from flask import Flask # from urllib.parse import quote # s = quote('"') import urllib s = urllib.parse.quote('"') print(s) it works,too. because of flask? Why I don't have the error anymore? is it a bug ? 回答1: The urllib

Python3网络爬虫(一):利用urllib进行简单的网页抓取

余生颓废 提交于 2019-12-06 17:15:17
运行平台:Windows Python版本:Python3.x IDE:Sublime text3 转载请注明作者和出处: http://blog.csdn.net/c406495762/article/details/58716886 一直想学习Python爬虫的知识,在网上搜索了一下,大部分都是基于Python2.x的。因此打算写一个Python3.x的爬虫笔记,以便后续回顾,欢迎一起交流、共同进步。 一、预备知识 1.Python3.x基础知识学习: 可以在通过如下方式进行学习: (1)廖雪峰Python3教程(文档): URL: http://www.liaoxuefeng.com/ (2)菜鸟教程Python3教程(文档): URL: http://www.runoob.com/python3/python3-tutorial.html (3)鱼C工作室Python教程(视频): 小甲鱼老师很厉害,讲课风格幽默诙谐,如果时间充裕可以考虑看视频。 URL: http://www.fishc.com/ 2.开发环境搭建: Sublime text3搭建Pyhthon IDE可以查看博客: URL: http://www.cnblogs.com/nx520zj/p/5787393.html URL: http://blog.csdn.net/c406495762/article

unable to send data using urllib and urllib2 (python)

送分小仙女□ 提交于 2019-12-06 15:42:54
Hello everybody (first post here). I am trying to send data to a webpage. This webpage request two fields (a file and an e-mail address) if everything is ok the webpage returns a page saying "everything is ok" and sends a file to the provided e-mail address. I execute the code below and I get nothing in my e-mail account. import urllib, urllib2 params = urllib.urlencode({'uploaded': open('file'),'email': 'user@domain.com'}) req = urllib2.urlopen('http://webpage.com', params) print req.read() the print command gives me the code of the home page (I assume instead it should give the code of the

Using urlopen to open list of urls

允我心安 提交于 2019-12-06 15:26:29
问题 I have a python script that fetches a webpage and mirrors it. It works fine for one specific page, but I can't get it to work for more than one. I assumed I could put multiple URLs into a list and then feed that to the function, but I get this error: Traceback (most recent call last): File "autowget.py", line 46, in <module> getUrl() File "autowget.py", line 43, in getUrl response = urllib.request.urlopen(url) File "/usr/lib/python3.2/urllib/request.py", line 139, in urlopen return opener

HTTP Error 403: Forbidden with urlretrieve

£可爱£侵袭症+ 提交于 2019-12-06 14:18:16
I am trying to download a PDF, however I get the following error: HTTP Error 403: Forbidden I am aware that the server is blocking for whatever reason, but I cant seem to find a solution. import urllib.request import urllib.parse import requests def download_pdf(url): full_name = "Test.pdf" urllib.request.urlretrieve(url, full_name) try: url = ('http://papers.xtremepapers.com/CIE/Cambridge%20IGCSE/Mathematics%20(0580)/0580_s03_qp_1.pdf') print('initialized') hdr = {} hdr = { 'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526

Python: urllib.error.HTTPError: HTTP Error 404: Not Found

梦想的初衷 提交于 2019-12-06 13:39:01
问题 I wrote a script to find spelling mistakes in SO questions' titles. I used it for about a month.This was working fine. But now, when I try to run it, I am getting this. Traceback (most recent call last): File "copyeditor.py", line 32, in <module> find_bad_qn(i) File "copyeditor.py", line 15, in find_bad_qn html = urlopen(url) File "/usr/lib/python3.4/urllib/request.py", line 161, in urlopen return opener.open(url, data, timeout) File "/usr/lib/python3.4/urllib/request.py", line 469, in open