urllib2 | 易学教程

uploading file using urllib2

阅读更多关于 uploading file using urllib2

问题 I'm new to python and I'm writing a code to upload a file using urllib2 but I can't make it work. Here's the code: class Get(object): handlers = list() def __init__(self,url): self.url = url self.request = urllib2.Request(url) self.request.add_header('User-Agent',"Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13") def auth(self,username,password): pass_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() pass_mgr.add_password(None, self.url, username, password)

Why am I getting this error? HTTP Error 407: Proxy Authentication Required

阅读更多关于 Why am I getting this error? HTTP Error 407: Proxy Authentication Required

问题 I am using the following code found on post, How to specify an authenticated proxy for a python http connection? import urllib2 def get_proxy_opener(proxyurl, proxyuser, proxypass, proxyscheme="http"): password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm() password_mgr.add_password(None, proxyurl, proxyuser, proxypass) proxy_handler = urllib2.ProxyHandler({proxyscheme: proxyurl}) proxy_auth_handler = urllib2.ProxyBasicAuthHandler(password_mgr) return urllib2.build_opener(proxy_handler,

urllib2.urlopen('ur') gives error

阅读更多关于 urllib2.urlopen('ur') gives error

问题 I am new to python and trying to extract the contents of a page. When I do urlopen('http://www.google.com') , I get the following error : File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/urllib2.py", line 126, in urlopen return _opener.open(url, data, timeout) File "/usr/lib/python2.7/urllib2.py", line 391, in open response = self._open(req, data) File "/usr/lib/python2.7/urllib2.py", line 409, in _open '_open', req) File "/usr/lib/python2.7/urllib2.py", line 369, in _call_chain

Pin down exact content location in html for web scraping urllib2 Beautiful Soup

阅读更多关于 Pin down exact content location in html for web scraping urllib2 Beautiful Soup

问题 I'm new to web scraping, have little exposure to html file systems and wanted to know if there is a better more efficient way to search for a required content on the html version of a web page. Currently, I want to scrape reviews for a product here: http://www.walmart.com/ip/29701960?wmlspartner=wlpa&adid=22222222227022069601&wl0=&wl1=g&wl2=c&wl3=34297254061&wl4=&wl5=pla&wl6=62272156621&veh=sem For this, I have the following code: url = http://www.walmart.com/ip/29701960? wmlspartner=wlpa

How to programmatically retrieve access_token from client-side OAuth flow using Python?

阅读更多关于 How to programmatically retrieve access_token from client-side OAuth flow using Python?

问题 This question was posted on StackApps, but the issue may be more a programming issue than an authentication issue, hence it may deserve a better place here. I am working on an desktop inbox notifier for StackOverflow, using the API with Python. The script I am working on first logs the user in on StackExchange, and then requests authorisation for the application. Assuming the application has been authorised through web-browser interaction of the user, the application should be able to make

Python web scraping on large html webpages

阅读更多关于 Python web scraping on large html webpages

问题 I am trying to get all the historical information of a particular stock from yahoo finance. I am new to python and web-scraping. I want to download all the historical data into a CSV file. The problem is that the code downloads only the first 100 entries of any stock on the website. When any stock is viewed on the browser, we have to scroll to the bottom of the page for more table entries to load. I think the same thing is happening when I download using the library. Some kind of optimization

Beautifulsoup functionality not working properly in specific scenario

阅读更多关于 Beautifulsoup functionality not working properly in specific scenario

问题 I am trying to read in the following url using urllib2: http://frcwest.com/ and then search the data for the meta redirect. It reads the following data in: <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"><head><title></title><meta content="0;url= Home.html" http-equiv="refresh"/></head><body></body></html> Reading it into Beautifulsoup

Timing out urllib2 urlopen operation in Python 2.4

阅读更多关于 Timing out urllib2 urlopen operation in Python 2.4

问题 I've just inherited some Python code and need to fix a bug as soon as possible. I have very little Python knowledge so please excuse my ignorance. I am using urllib2 to extract data from web pages. Despite using socket.setdefaulttimeout(30) I am still coming across URLs that hang seemingly indefinitely. I want to time out the extraction and have got this far after much searching the web: import socket socket.setdefaulttimeout(30) reqdata = urllib2.Request(urltocollect) def handler(reqdata): ?

urllib2 gives HTTP Error 400: Bad Request for certain urls, works for others

阅读更多关于 urllib2 gives HTTP Error 400: Bad Request for certain urls, works for others

问题 I'm trying to do a simple HTTP get request with Python's urllib2 module. It works sometimes, but other times I get HTTP Error 400: Bad Request . I know it's not an issue with the URL, because if I use urllib and simply do urllib.urlopen(url) it works fine - but when I add headers and do urllib2.urlopen() I get Bad Request on certain sites. Here is the code that's not working: # -*- coding: utf-8 -*- import re,sys,urllib,urllib2 url = "http://www.gamestop.com/" headers = {'User-Agent:':

Extracting a table from a website

阅读更多关于 Extracting a table from a website

问题 I've tried many times to retrieve the table at this website: http://www.whoscored.com/Players/845/History/Tomas-Rosicky (the one under "Historical Participations") import urllib2 from bs4 import BeautifulSoup soup = BeautifulSoup(urllib2.urlopen('http://www.whoscored.com/Players/845/').read()) This is the Python code I am using to retrieve the table html, but I am getting an empty string. Help me out! 回答1: The desired table is formed via an asynchronous API call to the http://www.whoscored