Reading BigQuery federated table as source in Dataflow throws an error

这一生的挚爱 提交于 2021-01-27 07:07:46

问题


I have a federated source in BigQuery which is pointing to some CSV files in GCS.

When I try to read to the federated BigQuery table as a source for a Dataflow pipeline, it throws the following error:

    1226 [main] ERROR com.google.cloud.dataflow.sdk.util.BigQueryTableRowIterator  - Error reading from BigQuery table Federated_test_dataflow of dataset CPT_7414_PLAYGROUND : 400 Bad Request
{
  "code" : 400,
  "errors" : [ {
    "domain" : "global",
    "message" : "Cannot list a table of type EXTERNAL.",
    "reason" : "invalid"
  } ],
  "message" : "Cannot list a table of type EXTERNAL."
}

Does Dataflow not support federated sources in BigQuery, or am I doing something wrong? I do know that I could read the files from GCS directly into my pipeline, but I'd prefer to work with BigQuery TableRow objects instead due to the design of the application.

 PCollection<TableRow> results = pipeline.apply("fed-test", BigQueryIO.Read.from("<project_id>:CPT_7414_PLAYGROUND.Federated_test_dataflow")).apply(ParDo.of(new DoFn<TableRow, TableRow>() {
        @Override
        public void processElement(ProcessContext c) throws Exception {
            System.out.println(c.element());
        }
    }));

回答1:


As Michael says, BigQuery does not support directly reading from EXTERNAL (federated tables) or VIEWs: even reading effectively takes a query.

To read from these tables in Dataflow, you can instead use

BigQueryIO.Read.fromQuery("SELECT * FROM table_or_view_name")

which will issue the query and save the result to a temporary table, and then begin the read process. Of course, this will incur the costs of querying on BigQuery, so if you wish to read from the same VIEW or EXTERNAL table repeatedly you may want to manually create the table.




回答2:


The Dataflow BigQuery source was designed to read BigQuery managed tables of type "TABLE". (The type definition can be found at https://cloud.google.com/bigquery/docs/reference/v2/tables#type.) EXTERNAL and VIEW tables are not supported.

The BigQuery "federated table" feature allows bigquery to directly query data in places like Google Cloud Storage. Dataflow can also read files from Google Cloud Storage, so you should be able to point your Dataflow computation directly at the sources you want to read.



来源:https://stackoverflow.com/questions/36193519/reading-bigquery-federated-table-as-source-in-dataflow-throws-an-error

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!