My situation is as follows:
Google Account A has some data in BigQuery.
Google Account B manages Account A\'s BigQuery data, and has also been given editor p
For those of you trying to do this via Airflow or Google Cloud Composer, there are two main steps you'll need to do to accomplish this.
Grant view access to the spreadsheet to the project_name@developer.gserviceaccount.com
. This should be the same service account you're using to access Google BigQuery. This can be done in the Sheets GUI or programmatically.
Add the following scope to your Google Cloud Connection in Airflow:
You will then be able to query external tables that reference Google Sheets.
You should be able to get this working with the following steps:
First share the sheet with the email/"service account id" associated with the service account.
Then you'll be able to access your sheet-backed table if you create a Client with the bigquery and drive scopes. (You might need to have domain-wide-delegation enabled on the service account).
scopes = ['https://www.googleapis.com/auth/bigquery', 'https://www.googleapis.com/auth/drive']
credentials = ServiceAccountCredentials.from_json_keyfile_name(
'<path_to_json>', scopes=scopes)
# Instantiates a client
client = bigquery.Client(project = PROJECT, credentials = credentials)
bqQuery = client.run_sync_query(q)
bqQuery.run()
bqQuery.fetch_data()
While Orbit's answer helped me to find a solution for the issue, there are a few more things you need to consider. Therefore, I like to add my detailed solution to the problem. This solution is required if Orbit's basic solution does not work, in particular, if you use the G Suite and your policies do not allow sharing sheets/docs with accounts outside of your domain. In this case you cannot directly share a doc/sheet with the service account.
Before you start:
user@company.com
has access to the sheet.Add the required scopes to your service account's OAuth client (you may need to ask a G Suite admin to do this for you):
https://www.googleapis.com/auth/bigquery
https://www.googleapis.com/auth/drive
If the delegated user can access your drive-based table in the BigQuery UI, your service account should now also be able to access it on behalf of the delegated user.
Here is a full code snippet that worked for me:
#!/usr/bin/env python
import httplib2
from google.cloud import bigquery
from oauth2client.service_account import ServiceAccountCredentials
scopes = [
"https://www.googleapis.com/auth/drive",
"https://www.googleapis.com/auth/bigquery",
]
delegated_user = "user@example.com"
project = 'project-name'
table = 'dataset-name.table-name'
query = 'SELECT count(*) FROM [%s:%s]' % (project, table)
creds = ServiceAccountCredentials.from_json_keyfile_name('secret.json', scopes=scopes)
creds = creds.create_delegated(delegated_user)
http = creds.authorize(httplib2.Http())
client = bigquery.Client(http=http)
bq = client.run_sync_query(query)
bq.run()
print bq.fetch_data()
Note that I was not able to setup the delegation directly and needed to create an HTTP client using creds = creds.create_delegated(delegated_user)
and http = creds.authorize(httplib2.Http())
. The authorized HTTP client can then be used as HTTP client for the BigQuery client: client = bigquery.Client(http=http)
.
Also note that the service account does not need to have any predefined roles assigned in the project settings, i.e., you do not have to make it a bigquery user or even a project owner. I suppose it acquires access primarily via delegation.
Just need to add step from Evan Kaeding answer. You can find airflow connection in Airflow UI menu "Admin" -> "Connections" -> choose your connection. In my case I also need to add keyfile path or keyfile JSON of your service account in the airflow connection
based on this references https://cloud.google.com/composer/docs/how-to/managing/connections#creating_a_connection_to_another_project