Write Python Dataframe to CSV file directly in Azure Datalake

不打扰是莪最后的温柔 提交于 2020-12-02 18:27:33

问题


I have imported an excel file into a pandas dataframe and have completed the data exploration and cleaning process.

I now want to write the cleaned dataframe to csv file back to Azure DataLake, without saving it first as a local file. I am using pandas 3.

My code looks like this:

token = lib.auth(tenant_id = '', 
                 client_secret ='', 
                 client_id = '')

adl = core.AzureDLFileSystem(token, store_name)

with adl.open(path='Raw/Gold/Myfile.csv', mode='wb') as f:
    **in_xls.to_csv(f, encoding='utf-8')**
    f.close()

I get the following dump in statement in bold.

TypeError: a bytes-like object is required, not 'str'

I also tried but without any luck

with adl.open(path='Raw/Gold/Myfile.csv', mode='wb') as f:
    with io.BytesIO(in_xls) as byte_buf:
        byte_buf.to_csv(f, encoding='utf-8')
        f.close()

I am getting the below error:

TypeError: a bytes-like object is required, not 'DataFrame'

Any ideas/tips will be much appreciated


回答1:


I got this working with pandas the other day with python 3.X. This code runs on an on premise machine and connects to the azure data store in the cloud.

Assuming df is a pandas dataframe you can use the following code:

adl = core.AzureDLFileSystem(token, store_name='YOUR_ADLS_STORE_NAME')
      #toke is your login token that was created by whatever ADLS login method you decided.
      #Personally I use the ServiceProvider login
df_str = df.to_csv()
with adl.open('/path/to/file/on/adls/newfile.csv', 'wb') as f:
    f.write(str.encode(df_str))
    f.close()

This key is converting the dataframe to a string and than using the str.encode() function.

Hope this helps.



来源:https://stackoverflow.com/questions/42413909/write-python-dataframe-to-csv-file-directly-in-azure-datalake

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!