Load xlsx file from drive in colaboratory

三世轮回 提交于 2020-01-03 19:03:12

问题


How can I import MS-excel(.xlsx) file from google drive into colaboratory?

excel_file = drive.CreateFile({'id':'some id'})

does work(drive is a pydrive.drive.GoogleDrive object). But,

print excel_file.FetchContent()

returns None. And

excel_file.content()

throws:

TypeErrorTraceback (most recent call last) in () ----> 1 excel_file.content()

TypeError: '_io.BytesIO' object is not callable

My intent is (given some valid file 'id') to import it as an io object, which could be read by pandas read_excel(), and finally get a pandas dataframe out of it.


回答1:


You'll want to use excel_file.GetContentFile to save the file locally. Then, you can use the Pandas read_excel method after you !pip install -q xlrd.

Here's a full example: https://colab.research.google.com/notebook#fileId=1SU176zTQvhflodEzuiacNrzxFQ6fWeWC

What I did in more detail:

I created a new spreadsheet in sheets to be exported as an .xlsx file.

Next, I exported it as an .xlsx file and uploaded again to Drive. The URL is: https://drive.google.com/open?id=1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM

Note the file ID. In my case it's 1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM.

Then, in Colab, I tweaked the Drive download snippet to download the file. The key bits are:

file_id = '1Sv4ib5i7CKWhAHZkKg-uitIkS3xwxtXM'
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('exported.xlsx')

Finally, to create a Pandas DataFrame:

!pip install -q xlrd
import pandas as pd
df = pd.read_excel('exported.xlsx')
df

The !pip install... line installs the xlrd library, which is needed to read Excel files.




回答2:


I'm here to solve this problem.so you can import any file(.csv,.xlsx,...etc)from google drive to google colab.

Solution:

from google.colab import drive
drive.mount('/content/gdrive')

import pandas as pd
df=pd.read_csv('gdrive/My Drive/HDPrice.csv')

df.shape

df

!pip install --upgrade --quiet gspread

from google.colab import auth
auth.authenticate_user()

import gspread
from oauth2client.client import GoogleCredentials
gc=gspread.authorize(GoogleCredentials.get_application_default())

worksheet=gc.open('SampleData').sheet1
cell_list=worksheet

rows=worksheet.get_all_values()
print(rows)

import pandas as pd
pd.DataFrame.from_records(rows)


来源:https://stackoverflow.com/questions/47430544/load-xlsx-file-from-drive-in-colaboratory

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!