urllib | 易学教程

Pause before retry connection in Python

阅读更多关于 Pause before retry connection in Python

I am trying to connect to a server. Sometimes I cannot reach the server and would like to pause for a few seconds before trying again. How would I implement the pause feature in Python. Here is what I have so far. Thank you. while True: try: response = urllib.request.urlopen(http) except URLError as e: continue break I am using Python 3.2 This will block the thread for 2 seconds before continuing: import time time.sleep(2) In case you want to run lots of these in parallel, it would be much more scalable to use an asynchronous networking framework such as Twisted , where "sleeping" doesn't mean

speeding up urlib.urlretrieve

阅读更多关于 speeding up urlib.urlretrieve

I am downloading pictures from the internet, and as it turns out, I need to download lots of pictures. I am using a version of the following code fragment (actually looping through the links I intend to download and downloading the pictures : import urllib urllib.urlretrieve(link, filename) I am downloading roughly 1000 pictures every 15 minutes, which is awfully slow based on the number of pictures I need to download. For efficiency, I set a timeout every 5 seconds (still many downloads last much longer): import socket socket.setdefaulttimeout(5) Besides running a job on a computer cluster to

relevent query to how to fetch public key from public key server

阅读更多关于 relevent query to how to fetch public key from public key server

问题 import urllib response = urllib.urlopen('http://pool.sks-keyservers.net/') print 'RESPONSE:', response print 'URL :', response.geturl() headers = response.info() print 'DATE :', headers['date'] print 'HEADERS :' print '---------' print headers data = response.read() print 'LENGTH :', len(data) print 'DATA :' print '---------' print data This code enable me to see some webpage information and contents. what actually i had query that how to fetch the public key from any public key server using

How to pass parameter to Url with Python urlopen

阅读更多关于 How to pass parameter to Url with Python urlopen

I'm currently new to python programming. My problem is that my python program doesn't seem to pass/encode the parameter properly to the ASP file that I've created. This is my sample code: import urllib.request url = 'http://www.sample.com/myASP.asp' full_url = url + "?data='" + str(sentData).replace("'", '"').replace(" ", "%20").replace('"', "%22") + "'" print (full_url) response = urllib.request.urlopen(full_url) print(response) the output would give me something like: http://www.sample.com/myASP.asp?data='{%22mykey%22:%20[{%22idno%22:%20%22id123%22,%20%22name%22:%20%22ej%22}]}' The asp file

Python 3.5 urllib.request 403 Forbidden Error

阅读更多关于 Python 3.5 urllib.request 403 Forbidden Error

import urllib.request import urllib from bs4 import BeautifulSoup url = "https://www.brightscope.com/ratings" page = urllib.request.urlopen(url) soup = BeautifulSoup(page, "html.parser") print(soup.title) I was trying to go to the above site and the code keeps spitting out a 403 Forbidden Error. Any Ideas? C:\Users\jerem\AppData\Local\Programs\Python\Python35-32\python.exe "C:/Users/jerem/PycharmProjects/webscraper/url scraper.py" Traceback (most recent call last): File "C:/Users/jerem/PycharmProjects/webscraper/url scraper.py", line 7, in page = urllib.request.urlopen(url) File "C:\Users

Print code from web page with python and urllib

阅读更多关于 Print code from web page with python and urllib

问题 I'm trying to use python and urllib to look at the code of a certain web page. I've tried and succeeded this at other webpages using the code: from urllib import * url = code = urlopen(url).read() print code But it returns nothing at all. My guess is it's because the page has a lot of javascripts? What to do? 回答1: Dynamic client side generated pages (JavaScript) You can not use urllib alone to see code that been rendered dynamically client side (JavaScript). The reason is that urllib only

爬虫常用库介绍

阅读更多关于爬虫常用库介绍

目录 urllib Requests BeautifulSoup selenium @ urllib Urllib是 python 内置的库，在 Python 这个内置的 Urllib 库中有这么 4 个模块 request：request模块是我们用的比较多的，就是用它来发起请求，所以我们重点说说这个模块 error：error模块就是当我们在使用 request 模块遇到错了，就可以用它来进行异常处理 parse：parse模块就是用来解析我们的 URL 地址的，比如解析域名地址啦，URL指定的目录等 robotparser：这个用的就比较少了，它就是用来解析网站的 robot.txt 加群：456926667，获取更多学习资料、练手项目，以及学习氛围了解了 urllib 之后我们就可以用 python 代码来模拟请求了 Requests Requests这个库比我们上次说的 urllib 可是要牛逼一丢丢的，毕竟 Requests 是在 urllib 的基础上搞出来的。通过它我们可以用更少的代码模拟浏览器操作。对于不是 python 的内置库，我们需要安装一下，直接使用 pip 安装 pip install requests 一行代码GET请求 r = requests.get('https://www.sina.com.cn/') 一行代码post请求 r =

Downloading pdf files using mechanize and urllib

阅读更多关于 Downloading pdf files using mechanize and urllib

I am new to Python, and my current task is to write a web crawler that looks for PDF files in certain webpages and downloads them. Here's my current approach (just for 1 sample url): import mechanize import urllib import sys mech = mechanize.Browser() mech.set_handle_robots(False) url = "http://www.xyz.com" try: mech.open(url, timeout = 30.0) except HTTPError, e: sys.exit("%d: %s" % (e.code, e.msg)) links = mech.links() for l in links: #Some are relative links path = str(l.base_url[:-1])+str(l.url) if path.find(".pdf") > 0: urllib.urlretrieve(path) The program runs without any errors, but I am

Form Submission in Python Without Name Attribute

阅读更多关于 Form Submission in Python Without Name Attribute

Background: Using urllib and urllib2 in Python, you can do a form submission. You first create a dictionary. formdictionary = { 'search' : 'stackoverflow' } Then you use urlencode method of urllib to transform this dictionary. params = urllib.urlencode(formdictionary) You can now make a url request with urllib2 and pass the variable params as a secondary parameter with the first parameter being the url. open = urllib2.urlopen('www.searchpage.com', params) From my understanding, urlencode automatically encodes the dictionary in html and adds the input tag. It takes the key to be the name

TypeError: cannot concatenate 'str' and 'instance' objects (python urllib)

阅读更多关于 TypeError: cannot concatenate 'str' and 'instance' objects (python urllib)

Writing a python program, and I came up with this error while using the urllib.urlopen function. Traceback (most recent call last): File "ChurchScraper.py", line 58, in <module> html = GetAllChurchPages() File "ChurchScraper.py", line 48, in GetAllChurchPages CPs = CPs + urllib.urlopen(url) TypeError: cannot concatenate 'str' and 'instance' objects url = 'http://website.com/index.php?cID=' + str(cID) CPs = CPs + urllib.urlopen(url) urlopen(url) returns a file-like object. To obtain the string contents, try CPs = CPs + urllib.urlopen(url).read() urllib.urllopen doesn't return a string, it