How To write the ETL job to transfer the mysql database table to another mysql rds database

问题

I am new to AWS. I want to write the ETL script using AWS Glue to transfer the data from one mysql database to another RDS mysql database .

Please suggest me to how to do this job using AWS glue

Thanks

回答1:

You can use pymysql or mysql.connector as a seperate zip file added to the glue job. We have used pymysql for all our production jobs running in AWS Glue/Aurora RDS

Use this connectors to connect to both the RDS Mysql instances. Read data from RDS Source db1 into a dataframe, perform the transformations, and finally write the transformed data to the RDS Target DB tables.

Here is the sample script for connecting to mysql connector, loading data from S3 into a staging table before loading to target database.

conn1 = mysql.connector.connect(host=url1, user=uname1, password=pwd1, database=sourcedbase)
cur1 = conn1.cursor()
cur1, conn1 = connect()

conn2 = mysql.connector.connect(host=url2, user=uname2, password=pwd2, database=targetdbase)
cur2 = conn2.cursor()
cur2, conn2 = connect()

createStgTable1 = "DROP TABLE IF EXISTS mydb.STG_TABLE;"
createStgTable2 = "CREATE TABLE mydb.STG_TABLE(COL1 VARCHAR(50) NOT NULL, COL2 VARCHAR(50), COL3 VARCHAR(50), COL4 CHAR(1) NOT NULL);"
loadQry = "LOAD DATA FROM S3 PREFIX 's3://<bucketname>/folder' REPLACE INTO TABLE mydb.STG_TABLE FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' IGNORE 1 LINES (@var1, @var2, @var3, @var4) SET col1= @var1, col2= @var2, col3= @var3, col4=@var4;"
cur.execute(createStgTable1)
cur.execute(createStgTable2)
cur.execute(loadQry)
conn.commit()

"Load data....." is from Aurora, to load data from S3 directly into a mysql table.

Insert query to RDS Instance:

conn = mysql.connector.connect(host=url, user=uname, password=pwd, database=dbase)
cur = conn.cursor()
insertQry = "INSERT INTO emp (id, emp_name, dept, designation, address1, city, state, active_start_date, is_active) SELECT (SELECT coalesce(MAX(ID),0) + 1 FROM atlas.emp) id, tmp.emp_name, tmp.dept, tmp.designation, tmp.address1, tmp.city, tmp.state, tmp.active_start_date, tmp.is_active from EMP_STG tmp ON DUPLICATE KEY UPDATE dept=tmp.dept, designation=tmp.designation, address1=tmp.address1, city=tmp.city, state=tmp.state, active_start_date=tmp.active_start_date, is_active =tmp.is_active ;"

n = cur.execute(insertQry)
print (" CURSOR status :", n)
conn.close()

回答2:

For high level steps regarding the GLUE / Scripts:

1) Create a zip file for pymysql or mysql.connector. Refer or google for the steps involved in this.

2) Upload your ETL python script for reading / writing between RDS, to an S3 location. AWS Glue provides its own code generator, you can use if it suits the transformation your are looking at.

3) You need to create an AWS Glue job, configure the job by pointing to you uploaded ETL script, the mysql jar files, etc. The rest you can leave it to default.

4) You also require certain IAM roles so that Glue can run the python script on your behalf.

Please refer this AWS Document on Glue jobs for more details on configurations

来源：https://stackoverflow.com/questions/59563663/how-to-write-the-etl-job-to-transfer-the-mysql-database-table-to-another-mysql-r

标签

amazon-web-services

etl

aws-glue