urllib | 易学教程

Python: urlretrieve PDF downloading

阅读更多关于 Python: urlretrieve PDF downloading

I am using urllib's urlretrieve() function in Python in order to try to grab some pdf's from websites. It has (at least for me) stopped working and is downloading damaged data (15 KB instead of 164 KB). I have tested this with several pdf's, all with no success (ie random.pdf ). I can't seem to get it to work, and I need to be able to download pdf's for the project I am working on. Here is an example of the kind of code I am using to download the pdf's (and parse the text using pdftotext.exe ): def get_html(url): # gets html of page from Internet import os import urllib2 import urllib from

page scraping to get prices from google finance

阅读更多关于 page scraping to get prices from google finance

问题 I am trying to get stock prices by scraping google finance pages, I am doing this in python, using urllib package and then using regex to get price data. When I leave my python script running, it works initially for some time (few minutes) and then starts throwing exception [HTTP Error 503: Service Unavailable] I guess this is happening because on web server side it detects frequent page updates as a robot and throws this exception after a while.. is there a way around this, i.e. deleting

python 之 Urllib库的基本使用

阅读更多关于 python 之 Urllib库的基本使用

目录 python 之 Urllib库的基本使用官方文档什么是Urllib urlopen url参数的使用 data参数的使用 timeout参数的使用响应响应类型、状态码、响应头 request 异常处理 URL解析功能一： urlunpars urljoin urlencode python 之 Urllib库的基本使用官方文档 https://docs.python.org/3/library/urllib.html 什么是Urllib Urllib是python内置的HTTP请求库包括以下模块 urllib.request 请求模块 urllib.error 异常处理模块 urllib.parse url解析模块 urllib.robotparser robots.txt解析模块 urlopen 关于urllib.request.urlopen参数的介绍： urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None) url参数的使用先写一个简单的例子： import urllib.request response = urllib.request.urlopen('http://www.baidu

Memory usage with concurrent.futures.ThreadPoolExecutor in Python3

阅读更多关于 Memory usage with concurrent.futures.ThreadPoolExecutor in Python3

问题 I am building a script to download and parse benefits information for health insurance plans on Obamacare exchanges. Part of this requires downloading and parsing the plan benefit JSON files from each individual insurance company. In order to do this, I am using concurrent.futures.ThreadPoolExecutor with 6 workers to download each file (with urllib), parse and loop thru the JSON and extract the relevant info (which is stored in nested dictionary within the script). (running Python 3.5.1 (v3.5

Setting proxy to urllib.request (Python3)

阅读更多关于 Setting proxy to urllib.request (Python3)

How can I set proxy for the last urllib in Python 3. I am doing the next from urllib import request as urlrequest ask = urlrequest.Request(url) # note that here Request has R not r as prev versions open = urlrequest.urlopen(req) open.read() I tried adding proxy as follows : ask=urlrequest.Request.set_proxy(ask,proxies,'http') However I don't know how correct it is since I am getting the next error: 336 def set_proxy(self, host, type): --> 337 if self.type == 'https' and not self._tunnel_host: 338 self._tunnel_host = self.host 339 else: AttributeError: 'NoneType' object has no attribute 'type'

Python: Log in a website using urllib

阅读更多关于 Python: Log in a website using urllib

问题 I want to log in to this website: https://www.fitbit.com/login This is my code I use: import urllib2 import urllib import cookielib login_url = 'https://www.fitbit.com/login' acc_pwd = {'login':'Log In','email':'username','password':'pwd'} cj = cookielib.CookieJar() ## add cookies opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj)) opener.addheaders = [('User-agent','Mozilla/5.0 \ (compatible; MSIE 6.0; Windows NT 5.1)')] data = urllib.urlencode(acc_pwd) try: opener.open(login_url

urllib3 maxretryError

阅读更多关于 urllib3 maxretryError

I have just started using urllib3, and I am running into a problem straightaway. According to their manuals, I started off with the simple example: Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53) [GCC 4.5.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import urllib3 >>> >>> http = urllib3.PoolManager() >>> r = http.request('GET', 'http://google.com/') I get thrown the following error: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python2.7/dist-packages/urllib3/request.py", line 65, in request **urlopen_kw

parse query string with urllib in Python 2.4

阅读更多关于 parse query string with urllib in Python 2.4

问题 Using Python2.4.5 (don't ask!) I want to parse a query string and get a dict in return. Do I have to do it "manually" like follows? >>> qs = 'first=1&second=4&third=3' >>> d = dict([x.split("=") for x in qs.split("&")]) >>> d {'second': '4', 'third': '3', 'first': '1'} Didn't find any useful method in urlparse . 回答1: You have two options: >>> cgi.parse_qs(qs) {'second': ['4'], 'third': ['3'], 'first': ['1']} or >>> cgi.parse_qsl(qs) [('first', '1'), ('second', '4'), ('third', '3')] The values

Python: Post Request with image files

阅读更多关于 Python: Post Request with image files

I have a server and I am trying to build a post request to get the data back. I think one way to achieve this is to add the parameters in the header and make the request. But I am getting few errors that I don't understand well enough to go forward. Html Form <html> <head> <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1"> </head> <body> <form method="POST" action="http://some.server.com:61235/imgdigest" enctype="multipart/form-data"> quality:<input type="text" name="quality" value="2"><br> category:<input type="text" name="category" value="1"><br> debug:<input type="text

Looping through a directory on the web and displaying its contents (files and other directories) via Python

阅读更多关于 Looping through a directory on the web and displaying its contents (files and other directories) via Python

问题 In the same vein as Process a set of files from a source directory to a destination directory in Python I'm wondering if it is possible to create a function that when given a web directory it will list out the files in said directory. Something like... files[] for file in urllib.listdir(dir): if file.isdir: # handle this as directory else: # handle as file I assume I would need to use the urllib library, but there doesn't seem to be an easy way of doing this, that I've seen at least. 回答1: