Get TableSchema from BigQuery result PCollection

前端 未结 2 557
星月不相逢
星月不相逢 2020-12-21 09:16

When I run a query in BigQuery Web UI, the results are displayed in a table where both name and type of each field are known (even when a field is a result of COUNT(), AVG()

相关标签:
2条回答
  • 2020-12-21 09:55

    Unfortunately, Dataflow SDK doesn't expose a schema returned by BigQuery via Dataflow's BigQueryIO API. There's no "good" workaround within the Dataflow API alone.

    Defining a schema manually is one workaround.

    Alternatively, you could make a separate query to BigQuery directly via jobs: query at pipeline construction time, whose result can then be passed to BigQueryIO.Write transform. This may incur additional cost, but that can probably be mitigated by altering the query slightly to reduce the amount of data processed. Correctness of the output is not relevant, since you'd be storing the schema only.

    0 讨论(0)
  • 2020-12-21 10:13

    Conceptually - you should write the function which will iterate thru all cells of given TableRow and for each - get name and type and while iterating you will create respective TableSchema.
    For simple schemas, I would expect, it should be relatively easy.
    For schemas with records, repeated, etc. this could be more complex

    0 讨论(0)
提交回复
热议问题