问题
I'm trying to get a queried-excel file from a site. When I enter the direct link, it will lead to a login page and once I've entered my username and password, it will proceed to download the excel file automatically. I am trying to avoid installing additional module that's not part of the standard python (This script will be running on a "standardize machine" and it won't work if the module is not installed)
I've tried the following but I see a "page login" information in the excel file itself :-|
import urllib
url = "myLink_queriedResult/result.xls"
urllib.urlretrieve(url,"C:\\test.xls")
SO.. then I looked into using urllib2 with password authentication but then I'm stuck.
I have the following code:
import urllib2
import urllib
theurl = 'myLink_queriedResult/result.xls'
username = 'myName'
password = 'myPassword'
passman = urllib2.HTTPPasswordMgrWithDefaultRealm()
passman.add_password(None, theurl, username, password)
authhandler = urllib2.HTTPBasicAuthHandler(passman)
opener = urllib2.build_opener(authhandler)
urllib2.install_opener(opener)
pagehandle = urllib2.urlopen(theurl)
pagehandle.read() ##but seems like it still only contain a 'login page'
Appreciate any advice in advance. :)
回答1:
Urllib is generally eschewed these days for Requests.
This would do what you want:
import requests
from requests.auth import HTTPBasicAuth
theurl= 'myLink_queriedResult/result.xls'
username = 'myUsername'
password = 'myPassword'
r=requests.get(theurl, auth=HTTPBasicAuth(username, password))
Here you can find more information on authentication using request.
回答2:
You may try through this way with Python 3,
import requests
#import necessary Authentication Method
from requests_ntlm import HttpNtlmAuth
from xlrd import open_workbook
import pandas as pd
from io import BytesIO
r = requests.get("http://example.website",auth=HttpNtlmAuth('acc','password'))
xd = pd.read_excel(BytesIO(r.content))
Ref:
https://medium.com/ibm-data-science-experience/excel-files-loading-from-object-storage-python-a54a2cbf4609
http://www.python-requests.org/en/latest/user/authentication/#basic-authentication
- Pandas read_csv from url
回答3:
You will need to use cookies to allow authentication. `
# check the input name for login information by inspecting source
values ={'username' : username, 'password':password}
data = urllib.parse.urlencode(values).encode("utf-8")
cookies = cookielib.CookieJar()
# create "opener" (OpenerDirector instance)
opener = urllib.request.build_opener(
urllib.request.HTTPRedirectHandler(),
urllib.request.HTTPHandler(debuglevel=0),
urllib.request.HTTPSHandler(debuglevel=0),
urllib.request.HTTPCookieProcessor(cookies))
# use the opener to fetch a URL
response = opener.open(the_url,data)
# Install the opener.
# Now all calls to urllib.request.urlopen use our opener.
urllib.request.install_opener(opener)`
来源:https://stackoverflow.com/questions/24835100/getting-a-file-from-an-authenticated-site-with-python-urllib-urllib2