问题
I am trying to read in excel files to Pandas from the following URLs:
url1 = 'https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls'
url2 = 'https://cib.societegenerale.com/fileadmin/indices_feeds/STTI_Historical.xls'
using the code:
pd.read_excel(url1)
However it doesn't work and I get the error:
XLRDError: Unsupported format, or corrupt file: Expected BOF record; found '2000/01/'
After searching on Google it seems that sometimes .xls files offered through URLs are actually held in a different file format behind the scenes such as html or xml.
When I manually download the excel file and open it using Excel I get presented with an error message: The file format and extension don't match. The file could be corrupted or unsafe. Unless you trust it's source don't open it"
When I do open it, it appears just like a normal excel file.
I came across a post online that suggested I open the file in a text editor to see if there is any additional info held as to proper file format but I don't see any additional info when opened using notepad++.
Could someone please help me get this "xls" file read into a pandas DataFramj properly please?
回答1:
It seems you can use read_csv:
import pandas as pd
df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls',
sep='\t',
parse_dates=[0],
names=['a','b','c','d','e','f'])
print df
Then I check last column f
if there are some other values as NaN
:
print df[df.f.notnull()]
Empty DataFrame
Columns: [a, b, c, d, e, f]
Index: []
So there are only NaN
, so you can filter last column f
by parameter usecols
:
import pandas as pd
df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls',
sep='\t',
parse_dates=[0],
names=['a','b','c','d','e','f'],
usecols=['a','b','c','d','e'])
print df
回答2:
If this helps someone.. you can read a Google Drive File directly by URL in to Excel without any login requirements. I tried in Google Colab it worked.
- Upload an XL File to Google Drive, or use an already uploaded one
- Share the File to Anyone with the Link (i don't know if view only works, but i tried with full access)
- Copy the Link
You will get something like this.
share url: https://drive.google.com/file/d/---some--long--string/view?usp=sharing
Get the download url from attempting to download the file (copy the url from there)
It will be something like this: (it has got the same google file id as above)
download url: https://drive.google.com/u/0/uc?id=---some--long--string&export=download
Now go to Google Colab and paste the following code:
import pandas as pd
fileurl = r'https://drive.google.com/file/d/---some--long--string/view?usp=sharing'
filedlurl = r'https://drive.google.com/u/0/uc?id=---some--long--string&export=download'
df = pd.read_excel(filedlurl)
df
That's it.. the file is in your df.
来源:https://stackoverflow.com/questions/37243121/using-pandas-to-read-in-excel-file-from-url-xlrderror