urllib | 易学教程

python 3 Login form on webpage with urllib and cookiejar

阅读更多关于 python 3 Login form on webpage with urllib and cookiejar

问题 I've been trying to make a python script login to my reddit account but it doesnt seem to work, could anybody tell me whats wrong with my code? It runs fine it just doesnt login.¨ cj = http.cookiejar.CookieJar() opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj)) opener.addheaders = [('User-agent', 'Mozilla/5.0')] urllib.request.install_opener(opener) authentication_url = 'https://ssl.reddit.com/post/login' payload = { 'op': 'login', 'user_name': 'username', 'user

urllib2 basic authentication oddites

阅读更多关于 urllib2 basic authentication oddites

问题 I'm slamming my head against the wall with this one. I've been trying every example, reading every last bit I can find online about basic http authorization with urllib2, but I can not figure out what is causing my specific error. Adding to the frustration is that the code works for one page, and yet not for another. logging into www.mysite.com/adm goes absolutely smooth. It authenticates no problem. Yet if I change the address to 'http://mysite.com/adm/items.php?n=201105&c=200' I receive

In Python 3.2, I can open and read an HTTPS web page with http.client, but urllib.request is failing to open the same page

阅读更多关于 In Python 3.2, I can open and read an HTTPS web page with http.client, but urllib.request is failing to open the same page

问题 I want to open and read https://yande.re/ with urllib.request , but I'm getting an SSL error. I can open and read the page just fine using http.client with this code: import http.client conn = http.client.HTTPSConnection('www.yande.re') conn.request('GET', 'https://yande.re/') resp = conn.getresponse() data = resp.read() However, the following code using urllib.request fails: import urllib.request opener = urllib.request.build_opener() resp = opener.open('https://yande.re/') data = resp.read(

Why I get urllib2.HTTPError with urllib2 and no errors with urllib?

阅读更多关于 Why I get urllib2.HTTPError with urllib2 and no errors with urllib?

问题 I have the following simple code: import urllib2 import sys sys.path.append('../BeautifulSoup/BeautifulSoup-3.1.0.1') from BeautifulSoup import * page='http://en.wikipedia.org/wiki/Main_Page' c=urllib2.urlopen(page) This code generates the following error messages: c=urllib2.urlopen(page) File "/usr/lib64/python2.4/urllib2.py", line 130, in urlopen return _opener.open(url, data) File "/usr/lib64/python2.4/urllib2.py", line 364, in open response = meth(req, response) File "/usr/lib64/python2.4

Extract Meta Keywords From Webpage?

阅读更多关于 Extract Meta Keywords From Webpage?

问题 I need to extract the meta keywords from a web page using Python. I was thinking that this could be done using urllib or urllib2, but I'm not sure. Anyone have any ideas? I am using Python 2.6 on Windows XP 回答1: lxml is faster than BeautifulSoup (I think) and has much better functionality, while remaining relatively easy to use. Example: 52> from urllib import urlopen 53> from lxml import etree 54> f = urlopen( "http://www.google.com" ).read() 55> tree = etree.HTML( f ) 61> m = tree.xpath( "/

requests response.iter_content() gets incomplete file ( 1024MB instead of 1.5GB )?

阅读更多关于 requests response.iter_content() gets incomplete file ( 1024MB instead of 1.5GB )?

问题 hi i have been using this code snippet to download files from a website, so far files smaller than 1GB are all good. but i noticed a 1.5GB file is incomplete # s is requests session object r = s.get(fileUrl, headers=headers, stream=True) start_time = time.time() with open(local_filename, 'wb') as f: count = 1 block_size = 512 try: total_size = int(r.headers.get('content-length')) print 'file total size :',total_size except TypeError: print 'using dummy length !!!' total_size = 10000000 for

Batch downloading text and images from URL with Python / urllib / beautifulsoup?

阅读更多关于 Batch downloading text and images from URL with Python / urllib / beautifulsoup?

问题 I have been browsing through several posts here but I just cannot get my head around batch-downloading images and text from a given URL with Python. import urllib,urllib2 import urlparse from BeautifulSoup import BeautifulSoup import os, sys def getAllImages(url): query = urllib2.Request(url) user_agent = "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 1.0.3705)" query.add_header("User-Agent", user_agent) page = BeautifulSoup(urllib2.urlopen(query)) for div in

Why am I getting an AttributeError when trying to print out

阅读更多关于 Why am I getting an AttributeError when trying to print out

问题 I am learning about urllib2 by following this tutorial http://docs.python.org/howto/urllib2.html#urlerror Running the code below yields a different outcome from the tutorial import urllib2 req = urllib2.Request('http://www.pretend-o-server.org') try: urllib2.urlopen(req) except urllib2.URLError, e: print e.reason Python interpreter spits this back Traceback (most recent call last): File "urlerror.py", line 8, in <module> print e.reason AttributeError: 'HTTPError' object has no attribute

Why am I getting an AttributeError when trying to print out

阅读更多关于 Why am I getting an AttributeError when trying to print out

Saving files downloaded from Urlretrieve to another folder other [closed]

阅读更多关于 Saving files downloaded from Urlretrieve to another folder other [closed]

问题 Closed. This question is off-topic. It is not currently accepting answers. Want to improve this question? Update the question so it's on-topic for Stack Overflow. Closed 6 years ago . Currently have this working and its downloading the files correctly but is placing them in the same folder where it is being ran from, but how would i go about say moving these to c:\downloads or something like this? urllib.urlretrieve(url, filename) 回答1: filename is basically your reference to the file and