Read a csv file from aws s3 using boto and pandas

前端 未结 3 496
一整个雨季
一整个雨季 2021-02-04 07:14

I have already read through the answers available here and here and these do not help.

I am trying to read a csv object from S3 bucket and have

相关标签:
3条回答
  • 2021-02-04 07:57

    Here is what I have done to successfully read the df from a csv on S3.

    import pandas as pd
    import boto3
    
    bucket = "yourbucket"
    file_name = "your_file.csv"
    
    s3 = boto3.client('s3') 
    # 's3' is a key word. create connection to S3 using default config and all buckets within S3
    
    obj = s3.get_object(Bucket= bucket, Key= file_name) 
    # get object and file (key) from bucket
    
    initial_df = pd.read_csv(obj['Body']) # 'Body' is a key word
    
    0 讨论(0)
  • 2021-02-04 08:09

    Maybe you can try to use pandas read_sql and pyathena:

    from pyathena import connect
    import pandas as pd
    
    conn = connect(s3_staging_dir='s3://bucket/folder',region_name='region')
    df = pd.read_sql('select * from database.table', conn)
    
    0 讨论(0)
  • 2021-02-04 08:15

    This worked for me.

    import pandas as pd
    import boto3
    import io
    
    s3_file_key = 'data/test.csv'
    bucket = 'data-bucket'
    
    s3 = boto3.client('s3')
    obj = s3.get_object(Bucket=bucket, Key=s3_file_key)
    
    initial_df = pd.read_csv(io.BytesIO(obj['Body'].read()))
    
    0 讨论(0)
提交回复
热议问题