urllib

Redirection url using urllib in Python 3

别等时光非礼了梦想. 提交于 2019-12-01 21:05:25
I would need to know what is my final URL when following redirections using urllib in Python 3. Let's say I've some code like : opener = urllib.request.build_opener() request = urllib.request.Request(url) u = opener.open(request) If my urls redirects to another website, how can I know this new website URL ? I've found nothing useful in documentation. Thanks for your help ! You can simply use u.geturl() to get the URL you were redirected to (or the original one if no redirect happened). 来源: https://stackoverflow.com/questions/4946244/redirection-url-using-urllib-in-python-3

Python3 urllib image retreval

坚强是说给别人听的谎言 提交于 2019-12-01 21:02:34
问题 I'm writing a small Python script to grab images via google images. I've managed to get things up to the point where I have the urls of the images I want in a handy list. Now, I just need to grab them... for each image url i do this: print("Retrieving:{0}".format(sFinalImageURL)) sExt = sFinalImageURL.split('.')[-1] #u = urllib.request.urlopen(sFinalImageURL) try: u = urllib.request.urlopen(sFinalImageURL) except: print("error: cannot retrieve image") continue raw_data = u.read() print("read

Catching http errors

假如想象 提交于 2019-12-01 16:16:42
how can I catch the 404 and 403 errors for pages in python and urllib(2), for example? Are there any fast ways without big class-wrappers? Added info (stack trace): Traceback (most recent call last): File "test.py", line 3, in <module> page = urllib2.urlopen("http://localhost:4444") File "/usr/lib/python2.6/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.6/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib/python2.6/urllib2.py", line 409, in _open '_open', req) File "/usr/lib/python2.6/urllib2.py", line 369, in _call

Python urllib urllib2

巧了我就是萌 提交于 2019-12-01 15:13:42
urlli2是对urllib的扩展。 相似与区别: 最常用的urllib.urlopen和urllib2.urlopen是类似的,但是参数有区别,例如超时和代理。 urllib接受url字符串来获取信息,而urllib2除了url字符串,也接受Request对象,而在Request对象中可以设置headers,而urllib却不能设置headers。 urllib有urlencode方法来对参数进行encode操作,而urllib2没有此方法,所以他们两经常一起使用。 相对来说urllib2功能更多一些,包含了各种handler和opener。 另外还有httplib模块,它提供了最基础的http请求的方法,例如可以做get/post/put等操作。 参考: http://blog.csdn.net/column/details/why-bug.html 最基本的应用: import urllib2 response = urllib2.urlopen('http://www.baidu.com/') html = response.read() print html 使用Request对象: import urllib2 req = urllib2.Request('http://www.baidu.com') response = urllib2.urlopen(req)

How to send cookies with urllib

依然范特西╮ 提交于 2019-12-01 14:41:01
I'm attempting to connect to a website that requires you to have a specific cookie to access it. For the sake of this question, we'll call the cookie 'required_cookie' and the value 'required_value'. This is my code: import urllib import http.cookiejar cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) opener.addheaders = [('required_cookie', 'required_value'), ('User-Agent', 'Mozilla/5.0')] urllib.request.install_opener(opener) req = Request('https://www.thewebsite.com/') webpage = urlopen(req).read() print(webpage) I'm new to urllib

How do I execute a python script that is stored on the internet?

自闭症网瘾萝莉.ら 提交于 2019-12-01 14:33:04
I am using python 2.4 for a program which imports scripts from the internet and executes them so a script could be changed by the author and the user wouldn't have to re-download the script. This is the part of the program that downloads the script: def downloadScript(self,script): myfile=open('#A file path/'+script['name']+'.txt','w') try: downloadedScript=urllib.urlopen(script['location']).read() except: #raise error return myfile.write(downloadedScript) myfile.close() def loadScript(self): if not self.scriptCurrentlyLoaded: script=self.scripts[self.scroller.listPos] if script['location']==

Remove newline in python with urllib

爷,独闯天下 提交于 2019-12-01 14:00:51
I am using Python 3.x. While using urllib.request to download the webpage, i am getting a lot of \n in between. I am trying to remove it using the methods given in the other threads of the forum, but i am not able to do so. I have used strip() function and the replace() function...but no luck! I am running this code on eclipse. Here is my code: import urllib.request #Downloading entire Web Document def download_page(a): opener = urllib.request.FancyURLopener({}) try: open_url = opener.open(a) page = str(open_url.read()) return page except: return"" raw_html = download_page("http://www.zseries

How to send cookies with urllib

筅森魡賤 提交于 2019-12-01 13:12:27
问题 I'm attempting to connect to a website that requires you to have a specific cookie to access it. For the sake of this question, we'll call the cookie 'required_cookie' and the value 'required_value'. This is my code: import urllib import http.cookiejar cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) opener.addheaders = [('required_cookie', 'required_value'), ('User-Agent', 'Mozilla/5.0')] urllib.request.install_opener(opener) req =

How do I execute a python script that is stored on the internet?

早过忘川 提交于 2019-12-01 12:47:58
问题 I am using python 2.4 for a program which imports scripts from the internet and executes them so a script could be changed by the author and the user wouldn't have to re-download the script. This is the part of the program that downloads the script: def downloadScript(self,script): myfile=open('#A file path/'+script['name']+'.txt','w') try: downloadedScript=urllib.urlopen(script['location']).read() except: #raise error return myfile.write(downloadedScript) myfile.close() def loadScript(self):

Python's urllib2 doesn't work on some sites

本小妞迷上赌 提交于 2019-12-01 11:29:25
I found that you can't read from some sites using Python's urllib2(or urllib). An example... urllib2.urlopen("http://www.dafont.com/").read() # Returns '' These sites work when you visit the site with a browser. I can even scrape them using PHP(didn't try other languages). I have seen other sites with the same issue - but can't remember the URL at the moment. My questions are... What is the cause of this issue? Any workarounds? I believe it gets blocked by the User-Agent. You can change User-Agent using the following sample code: USERAGENT = 'something' HEADERS = {'User-Agent': USERAGENT} req