问题
People, the company where I work has some MySQL databases on AWS (Amazon RDS). We are making a POC with BigQuery and what I am researching now is how to replicate the bases to BigQuery (the existing registers and the new ones in the future). My doubts are:
- How to replicate the MySQL tables and rows to BigQuery. Is there any tool to do that (I am reading about Amazon Database Migration Service)? Should I replicate to Google Cloud SQL and than export to BigQuery?
- How to replicate the future registers? Is possible to create a job inside MySQL to send the new registers after a predefined number? For example, after 1,000 new rows are inserted (or a time is passed), some event is "triggered" and the new registers are copied to Cloud SQL/BigQuery?
My initial idea is to dump the original base, load it to the other and use a script to listen to new registers and send them to the new base.
Have I explained it properly? Is it understandable?
回答1:
You will need to use one of the ETL tools which have integration with both mySQL and BigQuery to perform initial transfer of the data and copy subsequent changes to BigQuery. Take a look on the list of available tools [1]
You can also implement your own tool by developing a process which will extract the data from mySQL to a CSV file and then load that file into BigQuery using data import [2]
[1] https://cloud.google.com/bigquery/third-party-tools
[2] https://cloud.google.com/bigquery/loading-data-into-bigquery
回答2:
In addition to what Vadim said, you can try:
- mysqldump to CSV files to s3 (I believe RDS allows that)
- run "gsutil" Google Cloud Storage utility to copy data from s3 to GCS
- run "bq load file.csv" to load the file to BigQuery
I'm interested in hearing your experience, so feel free to ping me in private.
来源:https://stackoverflow.com/questions/35413459/synchronize-amazon-rds-with-google-bigquery