apache beam.io.BigQuerySource use_standard_sql not working when running as dataflow runner

前端 未结 1 829
一生所求
一生所求 2021-01-28 23:38

I have a dataflow job where I will read from bigquery query first (in standard sql). It works perfectly in direct runner mode. However I tried to run this dataflow in dataflow r

相关标签:
1条回答
  • 2021-01-29 00:37

    Try out following code which read data from Bigquery and write to Bigquery. Code is a apache beam dataflow runner code:-

    #------------Import Lib-----------------------#
    from apache_beam.options.pipeline_options import PipelineOptions, StandardOptions
    import apache_beam as beam, os, sys, argparse, logging
    from apache_beam.options.pipeline_options import SetupOptions
    
    #------------Set up BQ parameters-----------------------#
    # Replace with Project Id
    project = 'xxxxx'
    #plitting Of Records----------------------#
    
    def run(argv=None, save_main_session=True):
        parser = argparse.ArgumentParser()
        parser.add_argument(
              '--cur_suffix',
              dest='cur_suffix',
              help='Input table suffix to process.')
        known_args, pipeline_args = parser.parse_known_args(argv)
    
    
        pipeline_options = PipelineOptions(pipeline_args)
        pipeline_options.view_as(SetupOptions).save_main_session = save_main_session
        p1 = beam.Pipeline(options=pipeline_options)
    
    
        logging.info('***********')
        logging.info(known_args.cur_suffix)
        data_loading = (
            p1
            | 'ReadFromBQ' >> beam.io.Read(beam.io.BigQuerySource(query='''SELECT SUBSTR(_time, 1, 19) as _time, dest FROM `project.dataset.table`''', use_standard_sql=True))
        )
    
        project_id = "xxxxxxx"
        dataset_id = 'AAAAAAAA'
        table_schema_Audit = ('_time:DATETIME, dest:STRING')
    
    #---------------------Type = audit----------------------------------------------------------------------------------------------------------------------
        result = (
        data_loading
            | 'Write-Audit' >> beam.io.WriteToBigQuery(
                                                        table='YYYYYYY',
                                                        dataset=dataset_id,
                                                        project=project_id,
                                                        schema=table_schema_Audit,
                                                        create_disposition=beam.io.BigQueryDisposition.CREATE_IF_NEEDED,
                                                        write_disposition=beam.io.BigQueryDisposition.WRITE_TRUNCATE
                                                        ))
    
    
    
        result = p1.run()
        result.wait_until_finish()
    
    
    if __name__ == '__main__':
      path_service_account = 'ABGFDfc927.json'
      os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = path_service_account
      run()
    
    
    0 讨论(0)
提交回复
热议问题