Getting Google Spreadsheet CSV into A Pandas Dataframe

后端 未结 6 655
死守一世寂寞
死守一世寂寞 2020-12-04 09:31

I uploaded a file to Google spreadsheets (to make a publically accessible example IPython Notebook, with data) I was using the file in it\'s native form could be read into a

相关标签:
6条回答
  • 2020-12-04 10:10

    I have been using the following utils and it worked so far:

    def load_from_gspreadsheet(sheet_name, key):
        url = 'https://docs.google.com/spreadsheets/d/{key}/gviz/tq?tqx=out:csv&sheet={sheet_name}&headers=1'.format(
            key=key, sheet_name=sheet_name.replace(' ', '%20'))
    
        log.info('Loading google spreadsheet from {}'.format(url))
    
        df = pd.read_csv(url)
        return df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)
    

    You must specify the sheet_name and the key. The key is the string you get from the url in the following path: https://docs.google.com/spreadsheets/d/{key}/edit/.

    You can change the value of headers if you have more than one row for the column names but I am not sure if it still work with multi-headers.

    It may brake if Google will change their APIs.

    Also please bear in mind that your spreadsheet must be public, everyone with the link can read it.

    0 讨论(0)
  • 2020-12-04 10:15

    You can use read_csv() on a StringIO object:

    from io import BytesIO
    
    import requests
    r = requests.get('https://docs.google.com/spreadsheet/ccc?key=0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc&output=csv')
    data = r.content
    
    In [10]: df = pd.read_csv(BytesIO(data), index_col=0,parse_dates=['Quradate'])
    
    In [11]: df.head()
    Out[11]: 
              City                                            region     Res_Comm  \
    0       Dothan  South_Central-Montgomery-Auburn-Wiregrass-Dothan  Residential   
    10       Foley                              South_Mobile-Baldwin  Residential   
    12  Birmingham      North_Central-Birmingham-Tuscaloosa-Anniston   Commercial   
    38       Brent      North_Central-Birmingham-Tuscaloosa-Anniston  Residential   
    44      Athens                 North_Huntsville-Decatur-Florence  Residential   
    
              mkt_type            Quradate  National_exp  Alabama_exp  Sales_exp  \
    0            Rural 2010-01-15 00:00:00             2            2          3   
    10  Suburban_Urban 2010-01-15 00:00:00             4            4          4   
    12  Suburban_Urban 2010-01-15 00:00:00             2            2          3   
    38           Rural 2010-01-15 00:00:00             3            3          3   
    44  Suburban_Urban 2010-01-15 00:00:00             4            5          4   
    
        Inventory_exp  Price_exp  Credit_exp  
    0               2          3           3  
    10              4          4           3  
    12              2          2           3  
    38              3          3           2  
    44              4          4           4  
    
    0 讨论(0)
  • 2020-12-04 10:18

    Seems to work for me without the StringIO:

    test = pd.read_csv('https://docs.google.com/spreadsheets/d/' + 
                       '0Ak1ecr7i0wotdGJmTURJRnZLYlV3M2daNTRubTdwTXc' +
                       '/export?gid=0&format=csv',
                       # Set first column as rownames in data frame
                       index_col=0,
                       # Parse column values to datetime
                       parse_dates=['Quradate']
                      )
    test.head(5)  # Same result as @TomAugspurger
    

    BTW, including the ?gid= enables importing different sheets, find the gid in the URL.

    0 讨论(0)
  • Open the specific sheet you want in your browser. Make sure it's at least viewable by anyone with the link. Copy and paste the URL. You'll get something like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER.

    sheet_url = 'https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/edit#gid=NUMBER'
    

    First we turn that into a CSV export URL, like https://docs.google.com/spreadsheets/d/BLAHBLAHBLAH/export?format=csv&gid=NUMBER:

    csv_export_url = sheet_url.replace('/edit#gid=', '/export?format=csv&gid=')
    

    Then we pass it to pd.read_csv, which can take a URL.

    df = pd.read_csv(csv_export_url)
    

    This will break if Google changes its API (it seems undocumented), and may give unhelpful errors if a network failure occurs.

    0 讨论(0)
  • 2020-12-04 10:24

    If the csv file was shared via drive and not via spreadsheet then the below change to the url would work

    #Derive the id from the google drive shareable link.
    #For the file at hand the link is as below
    #<https://drive.google.com/open?id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69>
    file_id='1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
    link='https://drive.google.com/uc?export=download&id={FILE_ID}'
    csv_url=link.format(FILE_ID=file_id)
    #The final url would be as below:-
    #csv_url='https://drive.google.com/uc?export=download&id=1-tjNjMP6w0RUV4GhJWw08ql3wYwsNU69'
    df = pd.read_csv(csv_url)
    

    And the dataframe would be (if you just ran the above code)

        a   b   c   d
    0   0   1   2   3
    1   4   5   6   7
    2   8   9   10  11
    3   12  13  14  15
    

    See working code here.

    0 讨论(0)
  • 2020-12-04 10:27

    My approach is a bit different. I just used pandas.Dataframe() but obviously needed to install and import gspread. And it worked fine!

    gsheet = gs.open("Name")
    Sheet_name ="today"
    wsheet = gsheet.worksheet(Sheet_name)
    dataframe = pd.DataFrame(wsheet.get_all_records())
    
    0 讨论(0)
提交回复
热议问题