问题
I have a table in Google BigQuery
that I access and modify in Python using the pandas
functions read_gbq
and to_gbq
. The problem is that appending 100,000 lines takes about 150 seconds while appending 1 line takes about 40 seconds. I would like to update a value in the table rather than append a line, is there a way to update a value in the table using python that is very fast, or faster than 40 seconds?
回答1:
Not sure if you can do so using pandas
but you sure can using google-cloud
library.
You could just install it (pip install --upgrade google-cloud
) and run it like:
import uuid
import os
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'path_to_json_credentials.json'
from google.cloud.bigquery.client import Client
bq_client = Client()
job_id = str(uuid.uuid4())
query = """UPDATE `dataset.table` SET field_1 = '3' WHERE field_2 = '1'"""
job = bq_client.run_async_query(query=query, job_name=job_id)
job.use_legacy_sql = False
job.begin()
Here this operation is taking 2s on average.
As a side note, it's important to keep in mind the quotas related to DML operations in BQ, that is, know when it's appropriate to use them and if they fit your needs well.
来源:https://stackoverflow.com/questions/45003276/python-how-to-update-a-value-in-google-bigquery-in-less-than-40-seconds