Table1: Everything including the kitchen sink. Dates in the wrong format (year last so you cannot sort on that column), Numbers stored as VARCHAR, complete addresses in the \'st
I am not a MySQL person, so this is coming out of left field.
But I think the log files might be the answer.
Thankfully, you really only need to know 2 things from the log.
You need the record/rowid, and you need the operation.
In most DB's, and I assume MySQL, there's an implicit column on each row, like a rowid or recordid, or whatever. It's the internal row number used by the database. This is your "free" primary key.
Next, you need the operation. Notably whether it's an insert, update, or delete operation on the row.
You consolidate all of this information, in time order, and then run through it.
For each insert/update, you select the row from your original DB, and insert/update that row in your destination DB. If it's a delete, then you delete the row.
You don't care about field values, they're just not important. Do the whole row.
You hopefully shouldn't have to "parse" binary log files, MySQL already must have routines to do that, you just need to find and figure out how to use them (there may even be some handy "dump log" utility you could use).
This lets you keep the system pretty simple, and it should only depend on your actual activity during the day, rather than the total DB size. Finally, you could later optimize it by making it "smarter". For example, perhaps they insert a row, then update it, then delete it. You would know you can just ignore that row completely in your replay.
Obviously this takes a bit of arcane knowledge in order actually read the log files, but the rest should be straightforward. I would like to think that the log files are timestamped as well, so you can know to work on rows "from today", or whatever date range you want.