Apache Beam write to BigQuery table and schema as params

问题

I'm using Python SDK for Apache Beam. The values of the datatable and the schema are in the PCollection. This is the message I read from the PubSub:

{"DEVICE":"rms005_m1","DATESTAMP":"2020-05-29 20:54:26.733 UTC","SINUMERIK__x_position":69.54199981689453,"SINUMERIK__y_position":104.31400299072266,"SINUMERIK__z_position":139.0850067138672}

Then I want to write it to BigQuery using the values in the json message with the lambda function for the datatable and this function for the schema:

def set_schema(data):
    list = []
    for name in data:
        if name == 'STATUS' or name == 'DEVICE':
            type = 'STRING'
        elif name == 'DATESTAMP':
            type = 'TIMESTAMP'
        else:
            type = 'FLOAT'
        list.append(name + ':' + type)
    schema = ",".join(list)
    return schema

data = (p
        | "Read from PubSub" >> beam.io.ReadFromPubSub(topic=topic)
        | "Parse json" >> beam.Map(json_parse)
        | "Write to BQ" >> beam.io.WriteToBigQuery(
            table='project:dataset{datatable}__opdata'.format(datatable = lambda element: element["DEVICE"]),
            schema=set_schema,
            write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
        )
       )

When I execute it I get this error:

ValueError: Expected a table reference (PROJECT:DATASET.TABLE or DATASET.TABLE) instead of project:dataset.<function <lambda> at 0x7fa0dc378710>__opdata

How can I use the values of the PCollection as variables in the PTransform?

回答1:

You have to pass a function into table. Try this, instead:

| "Write to BQ" >> beam.io.WriteToBigQuery(
            table=lambda element: 'project:dataset{datatable}__opdata'.format(datatable = element["DEVICE"]),
            schema=set_schema,
            write_disposition=beam.io.BigQueryDisposition.WRITE_APPEND
        )

来源：https://stackoverflow.com/questions/62133280/apache-beam-write-to-bigquery-table-and-schema-as-params

标签

python

google-bigquery

apache-beam