问题
How can I import MS-excel(.xlsx) file from google drive into colaboratory?
excel_file = drive.CreateFile({'id':'some id'})
does work(drive
is a pydrive.drive.GoogleDrive
object). But,
print excel_file.FetchContent()
returns None. And
excel_file.content()
throws:
TypeErrorTraceback (most recent call last) in () ----> 1 excel_file.content()
TypeError: '_io.BytesIO' object is not callable
My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel()
, and finally get a pandas dataframe out of it.
回答1:
You'll want to use excel_file.GetContentFile
to save the file locally. Then, you can use the Pandas read_excel
method after you !pip install -q xlrd
.
Here's a full example: https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC
What I did in more detail:
I created a new spreadsheet in sheets to be exported as an .xlsx file.
Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is: https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM
Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM
.
Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:
file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')
Finally, to create a Pandas DataFrame:
!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df
The !pip install...
line installs the xlrd library, which is needed to read Excel files.
回答2:
I'm here to solve this problem.so you can import any file(.csv,.xlsx,...etc)from google drive to google colab.
Solution:
from google.colab import drive
drive.mount('/content/gdrive')
import pandas as pd
df=pd.read_csv('gdrive/My Drive/HDPrice.csv')
df.shape
df
!pip install --upgrade --quiet gspread
from google.colab import auth
auth.authenticate_user()
import gspread
from oauth2client.client import GoogleCredentials
gc=gspread.authorize(GoogleCredentials.get_application_default())
worksheet=gc.open('SampleData').sheet1
cell_list=worksheet
rows=worksheet.get_all_values()
print(rows)
import pandas as pd
pd.DataFrame.from_records(rows)
来源:https://stackoverflow.com/questions/47430544/load-xlsx-file-from-drive-in-colaboratory