Read CSV file to Datalab from Google Cloud Storage and convert to pandas dataframe

痴心易碎 提交于 2019-12-11 00:29:39

问题


I am trying to read a csv file save in gs to a dataframe for analysis

I have follow the following steps without success

mybucket = storage.Bucket('bucket-name')
data_csv = mybucket.object('data.csv')
df = pd.read_csv(data_csv)

this doesn't work since data_csv is not a path as expected by pd.read_csv I also tried

%%gcs read --object $data_csv --variable data
#result: %gcs: error: unrecognized arguments: Cloud Storage Object gs://path/to/file.csv

How can I read my file for analysis do this?

Thanks


回答1:


You just need to use the object's uri property to get the actual path:

uri = data_csv.uri
%%gcs read --object $uri --variable data

The first part of your code doesn't work because pandas expects the data to be in the local file system, but you're using a GCS bucket, which is in Cloud.




回答2:


%%gcs returns bytes objects. To read it use BytesIO from io (python 3)

mybucket = storage.Bucket('bucket-name')
data_csv = mybucket.object('data.csv')

%%gcs read --object $data_csv --variable data

df = pd.read_csv(BytesIO(data_csv), sep = ';')

if your csv file is comma separated, no need to specify < sep = ',' > which is the default read more about io library and packages here: Core tools for working with streams



来源:https://stackoverflow.com/questions/45806715/read-csv-file-to-datalab-from-google-cloud-storage-and-convert-to-pandas-datafra

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!