问题
I need access the parameter passed by BigqueryOperator in sql file, but I am getting error ERROR - queryParameters argument must have a type <class 'dict'> not <class 'list'>
I am using below code:
t2 = bigquery_operator.BigQueryOperator(
task_id='bq_from_source_to_clean',
sql='prepare.sql',
use_legacy_sql=False,
allow_large_results=True,
query_params=[{ 'name': 'threshold_date', 'parameterType': { 'type': 'STRING' },'parameterValue': { 'value': '2020-01-01' } }],
destination_dataset_table="{}.{}.{}".format('xxxx',
'xxxx',
'temp_airflow_test'),
create_disposition="CREATE_IF_NEEDED",
write_disposition="WRITE_TRUNCATE",
dag=dag
)
Sql :
select cast(DATE_ADD(a.dt_2, interval 7 day) as DATE) as dt_1
,a.dt_2
,cast('2010-01-01' as DATE) as dt_3
from (select cast(@threshold_date as date) as dt_2) a
I am using Google composer version composer-1.7.0-airflow-1.10.2
Thanks in Advance.
回答1:
After diving into the source code, it appears that BigQueryHook
had a bug fixed in Airflow 1.10.3.
The way you defined query_params
is correct for newer versions of Airflow, and should be a list according to BigQuery API : see https://cloud.google.com/bigquery/docs/parameterized-queries#bigquery_query_params_named-python.
Anyway, you are getting this error because in Airflow 1.10.2, query_params
is defined as a dict
, see :
https://github.com/apache/airflow/blob/1.10.2/airflow/contrib/hooks/bigquery_hook.py#L678
query_param_list = [
...
(query_params, 'queryParameters', None, dict),
...
]
This causes the internal _validate_value
function to throw a TypeError
:
https://github.com/apache/airflow/blob/1.10.2/airflow/contrib/hooks/bigquery_hook.py#L1954
def _validate_value(key, value, expected_type):
""" function to check expected type and raise
error if type is not correct """
if not isinstance(value, expected_type):
raise TypeError("{} argument must have a type {} not {}".format(
key, expected_type, type(value)))
I did not find any example of query_params
in Airflow 1.10.2 (or any unit tests...), but I think it's just because it is not usable.
These bugs has been fixed by these commits :
- https://github.com/apache/airflow/commit/0c797a830e3370bd6e39f5fcfc128a8fd776912e#diff-ee06f8fcbc476ea65446a30160c2a2b2R784 : change
dict
tolist
- https://github.com/apache/airflow/pull/4876 : update documentation
These changes have been embedded in Airflow 1.10.3, but, as of now, Airflow 1.10.3 is not available in Composer (https://cloud.google.com/composer/docs/concepts/versioning/composer-versions#new_environments) : latest version have been released May 16, 2019 and embed version 1.10.2.
Waiting for this new version, I see 2 ways to fix your problem :
- copy/paste fixed versions of
BigQueryOperator
andBigQueryHook
and embed them in your sources to use them, or extend the existingBigQueryHook
and override bugged methods. I'm not sure you can patchBigQueryHook
directly (no access to those files in Composer environment) - templatize your SQL query yourself (and not use
query_params
)
回答2:
This is definitely a bug with composer (Airflow 1.10.2) we fixed it by pulling down the airflow files from github and patching the bigquery_hook.py file and then referencing the fixed file in bigquery_operator.py (both uploaded to a lib folder), the fixes are:
bigquery_operator.py (line 21)
from lib.bigquery_hook import BigQueryHook
bigquery_hook.py
(line 678) (query_params, 'queryParameters', None, list),
(line 731) if 'useLegacySql' in configuration['query'] and configuration['query']['useLegacySql'] and \
then in your dag, reference the uploaded BQ operator: "from lib.bigquery_operator import BigQueryOperator"
回答3:
Sharing two ways to pass query params in BigQuery operator -
Jinja Templating - In the below query you see '{{ (execution_date - macros.timedelta(hours=1)).strftime('%Y-%m-%d %H:00:00') }}' is the jina tempate which will get resolved at runtime.
SELECT owner_display_name, title, view_count FROM
bigquery-public-data.stackoverflow.posts_questions
WHERE creation_date > CAST('{{ (execution_date - macros.timedelta(hours=1)).strftime('%Y-%m-%d %H:00:00') }}' AS TIMESTAMP) ORDER BY view_count DESC LIMIT 100query_params - for in clause, type would be array and in array type type should be type of the column in big query.
query_params=[ { 'name': 'DATE_IN_CLAUSE', 'parameterType': { 'type': 'ARRAY','arrayType': { 'type' :'TIMESTAMP'} },'parameterValue': { 'arrayValues': [{ 'value': datetime.utcnow().strftime('%Y-%m-%d %H:00:00') }, { 'value': (datetime.utcnow() - timedelta(hours=1)).strftime('%Y-%m-%d %H:00:00') }] } }, { 'name': 'COUNT', 'parameterType': { 'type': 'INTEGER' },'parameterValue': { 'value': 1 } } ]
SELECT owner_display_name, title, view_count FROM bigquery-public-data.stackoverflow.posts_questions WHERE creation_date in UNNEST(@DATE_IN_CLAUSE) and view_count > @COUNT ORDER BY view_count DESC LIMIT 100
Note - Upper query and params may not give you results but they will succeed without any error. These examples are just for demonstration of how to pass params.
来源:https://stackoverflow.com/questions/56287061/how-to-pass-query-parameter-to-sql-file-using-bigquery-operator