问题
how to load data from AWS RDS to Google BigQuery in streaming mode? Description: I have data in RDS (SQL Server), and wanted to load this data into Google BigQuery in real-time.
回答1:
There is no direct way to insert changes from Amazon RDS to Google Cloud BigQuery. It could be done with a pipeline like this
Amazon RDS ----Lambda/DMS----> Kinesis Data Streams -----Lambda----> BigQuery
- Read changes from Amazon RDS to Kinesis Data Streams using Lambda or use Cloud DMS. You can also push it to Kinesis Firehose for aggregating/batching records.
- Use Lambda to read from Kinesis streams/Firehose to insert into BigQuery using tabledata.insertAll (BQ streaming API). Code will be something similar to this.
回答2:
You can use the Cloud Storage Transfer Service that manages and schedules load jobs into BigQuery. This is the recommended migration method for this use case. Firstly you need to load data from AWS RDS to CSV files, then move it to S3. Amazon S3 transfers are a two step process:
- Transfer Service is used to bring data from S3 into GCS.
- BQ load job is used to load the data into BigQuery.
Another interesting solution that I found is about using AWS Data Pipeline to export data from MySQL and feed it to BigQuery.
Moreover, you can use one of the ETL tools (see here) which have integration with Amazon RDS and BigQuery to perform transfer of the data to BigQuery. One of the best is Fivetran.
I hope it helps you.
来源:https://stackoverflow.com/questions/60287594/how-to-load-data-from-aws-rds-to-google-bigquery-in-streaming-mode