Achieving read and write query availability in AWS Multi-AZ RDS

问题

I have configured Multi-AZ RDS mysql instance with no read replicas in a development environment and I am testing Multi-AZ RDS fail-over by rebooting the DB instance.

Below is my observation: During RDS fail-over, the client application will not lost connection but at the same time it won't be able to access the database as well and once fail-over completes, client will able to access the database.

Update 1: Above observation is wrong.What I observed just now is that after fail-over completion I am getting below error and it results in connection termination.

    ERROR 2003 (HY000): Can't connect to MySQL server on 'rds-test.czswqpewzqas.---------.amazonaws.com' (110)

So in short my queries are failing during reboot of Multi-AZ mysql instance. Does any one have any idea, what I am missing here.

Update - Achieving read availability : Now I have created a Read Replica for the Multi-AZ mysql instance and on getting above mentioned error, redirecting "select queries" to the Read Replica Instance.

So,using Read replica I am able to achieve read availability.Is this the right way?Would like to know if there is any other way to do it.

Also, how I can achieve write availability in Multi-AZ RDS?

回答1:

Your observations are correct. During the fail over, TCP connections are lost, the time to fail over to the secondary database and to switch over IP addresses in DNS.

It is up to the application to

a/ try to reconnect using exponential back off. Reconnection will be possible within minutes.

b/ decide how to behave during the failover.

Read transactions (SELECT) can be hand off to a read replica. Modern JDBC and ODBC drivers are able to handle read replica by themselves, just give the list of IP address / DNS names of your replicas. The driver will apply the load balancing automatically. No code change is required.

Write transactions are more complex to handle and there is no single answer for all applications. Correct answer will depend on your application & business requirements.

Some customers decide to block all write operations, return an error message to end users asking them to try again a few minutes later.

Some customers are queuing write transactions in an SQS queue. They develop a queue reader application to flush pending transactions when master database is available again. (depending on workload, S3 or DynamoDB can be use for this as well). Of course, your data will not be consistent during the fail over and a short period of time right after the fail-over, the time required to flush all pending write.

Please feel free to comment about other strategies used in real world scenarios.

来源：https://stackoverflow.com/questions/36643782/achieving-read-and-write-query-availability-in-aws-multi-az-rds

标签

amazon-web-services

amazon-rds

failover