问题
I'm using Python SDK for Apache Beam. The values of the datatable and the schema are in the PCollection. This is the message I read from the PubSub:
{"DEVICE":"rms005_m1","DATESTAMP":"2020-05-29 20:54:26.733 UTC","SINUMERIK__x_position":69.54199981689453,"SINUMERIK__y_position":104.31400299072266,"SINUMERIK__z_position":139.0850067138672}
Then I want to write it to BigQuery using the values in the json message with the lambda function for the datatable and this function for the schema:
def set_schema(data):
list = []
for name in data:
if name == 'STATUS' or name == 'DEVICE':
type = 'STRING'
elif name == 'DATESTAMP':
type = 'TIMESTAMP'
else:
type = 'FLOAT'
list.append(name + ':' + type)
schema = ",".join(list)
return schema
data = (p
| "Read from PubSub" >> beam.io.ReadFromPubSub(topic=topic)
| "Parse json" >> beam.Map(json_parse)
| "Write to BQ" >> beam.io.WriteToBigQuery(
table='project:dataset{datatable}__opdata'.format(datatable = lambda element: element["DEVICE"]),
schema=set_schema,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)
)
When I execute it I get this error:
ValueError: Expected a table reference (PROJECT:DATASET.TABLE or DATASET.TABLE) instead of project:dataset.<function <lambda> at 0x7fa0dc378710>__opdata
How can I use the values of the PCollection as variables in the PTransform?
回答1:
You have to pass a function into table. Try this, instead:
| "Write to BQ" >> beam.io.WriteToBigQuery(
table=lambda element: 'project:dataset{datatable}__opdata'.format(datatable = element["DEVICE"]),
schema=set_schema,
write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
)
来源:https://stackoverflow.com/questions/62133280/apache-beam-write-to-bigquery-table-and-schema-as-params