urllib2

Urllib2 authentication with API key

自古美人都是妖i 提交于 2020-01-14 03:13:34
问题 I am trying to connect to radian6 api, which requires the auth_appkey, auth_user and auth_pass as md5 encryption. When I am trying to connect using telnet I can get the response xml successfully telnet sandboxapi.radian6.com 80 Trying 142.166.170.31... Connected to sandboxapi.radian6.com. Escape character is '^]'. GET /socialcloud/v1/auth/authenticate HTTP/1.1 host: sandboxapi.radian6.com auth_appkey: 123456789 auth_user: xxx@xxx.com auth_pass: 'md5encryptedpassword' HTTP/1.1 200 OK Server:

Sending form data to aspx page

孤者浪人 提交于 2020-01-13 09:44:32
问题 There is a need to do a search on the website url = r'http://www.cpso.on.ca/docsearch/' this is an aspx page (I'm beginning this trek as of yesterday, sorry for noob questions) using BeautifulSoup, I can get the __VIEWSTATE and __EVENTVALIDATION like this: viewstate = soup.find('input', {'id' : '__VIEWSTATE'})['value'] eventval = soup.find('input', {'id' : '__EVENTVALIDATION'})['value'] and the header can be set like this: headers = {'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1

Using BeautifulSoup to select div blocks within HTML

扶醉桌前 提交于 2020-01-13 03:04:56
问题 I am trying to parse several div blocks using Beautiful Soup using some html from a website. However, I cannot work out which function should be used to select these div blocks. I have tried the following: import urllib2 from bs4 import BeautifulSoup def getData(): html = urllib2.urlopen("http://www.racingpost.com/horses2/results/home.sd?r_date=2013-09-22", timeout=10).read().decode('UTF-8') soup = BeautifulSoup(html) print(soup.title) print(soup.find_all('<div class="crBlock ">')) getData()

Using BeautifulSoup to select div blocks within HTML

隐身守侯 提交于 2020-01-13 03:04:16
问题 I am trying to parse several div blocks using Beautiful Soup using some html from a website. However, I cannot work out which function should be used to select these div blocks. I have tried the following: import urllib2 from bs4 import BeautifulSoup def getData(): html = urllib2.urlopen("http://www.racingpost.com/horses2/results/home.sd?r_date=2013-09-22", timeout=10).read().decode('UTF-8') soup = BeautifulSoup(html) print(soup.title) print(soup.find_all('<div class="crBlock ">')) getData()

Python Splinter (SeleniumHQ) how to take a screenshot of many webpages? [Connection refused]

寵の児 提交于 2020-01-12 10:20:21
问题 I want to take a screenshot of many webpages, I wrote this: from splinter.browser import Browser import urllib2 from urllib2 import URLError urls = ['http://ubuntu.com/', 'http://xubuntu.org/'] try : browser = Browser('firefox') for i in range(0, len(urls)) : browser.visit(urls[i]) if browser.status_code.is_success() : browser.driver.save_screenshot('your_screenshot' + str(i) + '.png') browser.quit() except SystemError : print('install firefox!') except urllib2.URLError, e: print(e) print(

urllib.urlopen works but urllib2.urlopen doesn't

家住魔仙堡 提交于 2020-01-12 07:11:38
问题 I have a simple website I'm testing. It's running on localhost and I can access it in my web browser. The index page is simply the word "running". urllib.urlopen will successfully read the page but urllib2.urlopen will not. Here's a script which demonstrates the problem (this is the actual script and not a simplification of a different test script): import urllib, urllib2 print urllib.urlopen("http://127.0.0.1").read() # prints "running" print urllib2.urlopen("http://127.0.0.1").read() #

urllib.urlopen works but urllib2.urlopen doesn't

喜你入骨 提交于 2020-01-12 07:09:52
问题 I have a simple website I'm testing. It's running on localhost and I can access it in my web browser. The index page is simply the word "running". urllib.urlopen will successfully read the page but urllib2.urlopen will not. Here's a script which demonstrates the problem (this is the actual script and not a simplification of a different test script): import urllib, urllib2 print urllib.urlopen("http://127.0.0.1").read() # prints "running" print urllib2.urlopen("http://127.0.0.1").read() #

Unable to load ASP.NET page using Python urllib2

三世轮回 提交于 2020-01-11 14:32:10
问题 I am trying to do a POST request to https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/WellDetails/WellDetails.aspx in order to scrape data. Here is my current code: from urllib import urlencode import urllib2 # Configuration uri = 'https://www.paoilandgasreporting.state.pa.us/publicreports/Modules/WellDetails/WellDetails.aspx' headers = { 'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13', 'HTTP_ACCEPT':

Downloading a file in Python

試著忘記壹切 提交于 2020-01-11 10:59:13
问题 import urllib2, sys if len(sys.argv) !=3: print "Usage: download.py <link> <saveas>" sys.exit(1) site = urllib2.urlopen(sys.argv[1]) meta = site.info() print "Size: ", meta.getheaders("Content-Length") f = open(sys.argv[2], 'wb') f.write(site.read()) f.close() I'm wondering how to display the file name and size before downloading and how to display the download progress of the file. Any help will be appreciated. 回答1: using urllib.urlretrieve import urllib, sys def progress_callback(blocks,

does urllib2 support preemptive authentication authentication?

自闭症网瘾萝莉.ら 提交于 2020-01-11 10:38:16
问题 I am trying access a REST API. I can get it working in Curl/REST Client (the UI tool), with preemptive authentication enabled. But, using urllib2, it doesn't seem to support this by default and I can't find a way to turn it on. Thanks :) 回答1: Here's a simple Preemptive HTTP basic auth handler, based on the code from urllib2.HTTPBasicAuthHandler . It can be used in the exact same manner, except an Authorization header will be added to every request with a matching URL. Note that this handler