How do I Authenticate a Service Account to Make Queries against a GDrive Sheet Backed BigQuery Table?

后端 未结 4 2044
野趣味
野趣味 2021-01-14 12:45

My situation is as follows:

Google Account A has some data in BigQuery.

Google Account B manages Account A\'s BigQuery data, and has also been given editor p

相关标签:
4条回答
  • 2021-01-14 12:51

    For those of you trying to do this via Airflow or Google Cloud Composer, there are two main steps you'll need to do to accomplish this.

    1. Grant view access to the spreadsheet to the project_name@developer.gserviceaccount.com. This should be the same service account you're using to access Google BigQuery. This can be done in the Sheets GUI or programmatically.

    2. Add the following scope to your Google Cloud Connection in Airflow:

    You will then be able to query external tables that reference Google Sheets.

    0 讨论(0)
  • 2021-01-14 12:52

    You should be able to get this working with the following steps:

    First share the sheet with the email/"service account id" associated with the service account.

    Then you'll be able to access your sheet-backed table if you create a Client with the bigquery and drive scopes. (You might need to have domain-wide-delegation enabled on the service account).

    scopes = ['https://www.googleapis.com/auth/bigquery', 'https://www.googleapis.com/auth/drive']
    
    credentials = ServiceAccountCredentials.from_json_keyfile_name(
    '<path_to_json>', scopes=scopes)
    
    # Instantiates a client
    client = bigquery.Client(project = PROJECT, credentials = credentials)
    
    bqQuery = client.run_sync_query(q)
    bqQuery.run()
    bqQuery.fetch_data()
    
    0 讨论(0)
  • 2021-01-14 12:57

    While Orbit's answer helped me to find a solution for the issue, there are a few more things you need to consider. Therefore, I like to add my detailed solution to the problem. This solution is required if Orbit's basic solution does not work, in particular, if you use the G Suite and your policies do not allow sharing sheets/docs with accounts outside of your domain. In this case you cannot directly share a doc/sheet with the service account.

    Before you start:

    1. Create or select a service account in your project
    2. Enable Domain-wide Delegation (DwD) in the account settings. If not present, this generates an OAuth client ID for the service account.
    3. Make sure the delegated user@company.com has access to the sheet.
    4. Add the required scopes to your service account's OAuth client (you may need to ask a G Suite admin to do this for you):

      • https://www.googleapis.com/auth/bigquery
      • https://www.googleapis.com/auth/drive

    If the delegated user can access your drive-based table in the BigQuery UI, your service account should now also be able to access it on behalf of the delegated user.

    Here is a full code snippet that worked for me:

    #!/usr/bin/env python
    
    import httplib2
    from google.cloud import bigquery
    from oauth2client.service_account import ServiceAccountCredentials
    
    scopes = [
        "https://www.googleapis.com/auth/drive",
        "https://www.googleapis.com/auth/bigquery",
    ]
    
    delegated_user = "user@example.com"
    project        = 'project-name'
    table          = 'dataset-name.table-name'
    query          = 'SELECT count(*) FROM [%s:%s]' % (project, table)
    
    creds = ServiceAccountCredentials.from_json_keyfile_name('secret.json', scopes=scopes)
    creds = creds.create_delegated(delegated_user)
    
    http = creds.authorize(httplib2.Http())
    client = bigquery.Client(http=http)
    
    bq = client.run_sync_query(query)
    bq.run()
    print bq.fetch_data()
    

    Note that I was not able to setup the delegation directly and needed to create an HTTP client using creds = creds.create_delegated(delegated_user) and http = creds.authorize(httplib2.Http()). The authorized HTTP client can then be used as HTTP client for the BigQuery client: client = bigquery.Client(http=http).

    Also note that the service account does not need to have any predefined roles assigned in the project settings, i.e., you do not have to make it a bigquery user or even a project owner. I suppose it acquires access primarily via delegation.

    0 讨论(0)
  • 2021-01-14 13:04

    Just need to add step from Evan Kaeding answer. You can find airflow connection in Airflow UI menu "Admin" -> "Connections" -> choose your connection. In my case I also need to add keyfile path or keyfile JSON of your service account in the airflow connection

    based on this references https://cloud.google.com/composer/docs/how-to/managing/connections#creating_a_connection_to_another_project

    0 讨论(0)
提交回复
热议问题