urllib

Python urllib cache

无人久伴 提交于 2019-12-20 05:46:11
问题 I'm writing a script in Python that should determine if it has internet access. import urllib CHECK_PAGE = "http://64.37.51.146/check.txt" CHECK_VALUE = "true\n" PROXY_VALUE = "Privoxy" OFFLINE_VALUE = "" page = urllib.urlopen(CHECK_PAGE) response = page.read() page.close() if response.find(PROXY_VALUE) != -1: urllib.getproxies = lambda x = None: {} page = urllib.urlopen(CHECK_PAGE) response = page.read() page.close() if response != CHECK_VALUE: print "'" + response + "' != '" + CHECK_VALUE +

Multi threaded web scraper using urlretrieve on a cookie-enabled site

你。 提交于 2019-12-20 03:17:05
问题 I am trying to write my first Python script, and with lots of Googling, I think that I am just about done. However, I will need some help getting myself across the finish line. I need to write a script that logs onto a cookie-enabled site, scrape a bunch of links, and then spawn a few processes to download the files. I have the program running in single-threaded, so I know that the code works. But, when I tried to create a pool of download workers, I ran into a wall. #manager.py import Fetch

TypeError: can't use a string pattern on a bytes-like object, api

和自甴很熟 提交于 2019-12-20 02:03:08
问题 import json import urllib.request, urllib.error, urllib.parse Name = 'BagFullOfHoles' #Random player Platform = 'xone'#pc, xbox, xone, ps4, ps3 url = 'http://api.bfhstats.com/api/playerInfo?plat=' + Platform + '&name=' + Name json_obj = urllib.request.urlopen(url) data = json.load(json_obj) print (data) TypeError: can't use a string pattern on a bytes-like object Just recently used 2to3.py and this error or others come up when I try to fix it . Anyone with any pointers? 回答1: The json_obj =

TypeError: can't use a string pattern on a bytes-like object, api

本秂侑毒 提交于 2019-12-20 02:02:16
问题 import json import urllib.request, urllib.error, urllib.parse Name = 'BagFullOfHoles' #Random player Platform = 'xone'#pc, xbox, xone, ps4, ps3 url = 'http://api.bfhstats.com/api/playerInfo?plat=' + Platform + '&name=' + Name json_obj = urllib.request.urlopen(url) data = json.load(json_obj) print (data) TypeError: can't use a string pattern on a bytes-like object Just recently used 2to3.py and this error or others come up when I try to fix it . Anyone with any pointers? 回答1: The json_obj =

urllib

假装没事ソ 提交于 2019-12-19 23:46:34
urllib 对url中的中文编解码 解码 from urllib import parse str= "%e7%bd%91%e7%9b%98" data= parse.unquote(rawurl) print(data) >>> 网盘 编码 from urllib.request import quote name = '网盘' data = quote(name) print(data) >>>'%E7%BD%91%E7%9B%98' 来源: https://www.cnblogs.com/0916m/p/11484367.html

Redirection url using urllib in Python 3

て烟熏妆下的殇ゞ 提交于 2019-12-19 20:47:32
问题 I would need to know what is my final URL when following redirections using urllib in Python 3. Let's say I've some code like : opener = urllib.request.build_opener() request = urllib.request.Request(url) u = opener.open(request) If my urls redirects to another website, how can I know this new website URL ? I've found nothing useful in documentation. Thanks for your help ! 回答1: You can simply use u.geturl() to get the URL you were redirected to (or the original one if no redirect happened).

Catching http errors

十年热恋 提交于 2019-12-19 13:39:42
问题 how can I catch the 404 and 403 errors for pages in python and urllib(2), for example? Are there any fast ways without big class-wrappers? Added info (stack trace): Traceback (most recent call last): File "test.py", line 3, in <module> page = urllib2.urlopen("http://localhost:4444") File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib/python2.6

Remove newline in python with urllib

不羁的心 提交于 2019-12-19 11:32:27
问题 I am using Python 3.x. While using urllib.request to download the webpage, i am getting a lot of \n in between. I am trying to remove it using the methods given in the other threads of the forum, but i am not able to do so. I have used strip() function and the replace() function...but no luck! I am running this code on eclipse. Here is my code: import urllib.request #Downloading entire Web Document def download_page(a): opener = urllib.request.FancyURLopener({}) try: open_url = opener.open(a)

python read file from a web URL

耗尽温柔 提交于 2019-12-19 10:27:41
问题 I am currently trying to read a txt file from a website. My script so far is: webFile = urllib.urlopen(currURL) This way, I can work with the file. However, when I try to store the file (in webFile ), I only get a link to the socket. Another solution I tried was to use read() webFile = urllib.urlopen(currURL).read() However this seems to remove the formating ( \n , \t etc) are removed. If I open the file like this: webFile = urllib.urlopen(currURL) I can read it line by line: for line in

How to log in to a website with urllib?

拥有回忆 提交于 2019-12-19 10:08:06
问题 I am trying to log on this website: http://www.broadinstitute.org/cmap/index.jsp. I am using python 3.3 on Windows. I followed this answer https://stackoverflow.com/a/2910487/651779. My code: import http.cookiejar import urllib url = 'http://www.broadinstitute.org/cmap/index.jsp' values = {'j_username' : 'username', 'j_password' : 'password'} data = urllib.parse.urlencode(values) binary_data = data.encode('ascii') cookies = http.cookiejar.CookieJar() opener = urllib.request.build_opener(