问题
I have installed hadoop and hbase for real time analytics purpose. The proble I face is to migrate data on line from mysql to Hbase.
The sqoop tool is useful to do bulk data migrations, is there any way that the data from mysql can be transfered to HBase on line (then and there when an insert/update/delete happens). So that real time analytics can be achieved. Not near-real-time.
Please help me on this regards.
回答1:
I think you faced task of setting up replication between different DBMS. It is a case when native replication mechanism is not useful.
Simplest solution would be create set of triggers over tables you want to replicate - and write the data to be replicated into some additional table. Then you can set up monitoring this table and apply changes to the HBase.
More robust, but more complicated solution would be to analyze MySQL log used for its native replication mechanism and apply the changes to the HBase.
In the same time it is not clear for me hoe HBase will give you real-time analytics. I wrote abit about this issue here:
Group by In HBase
回答2:
To add more information about where to use Hive in your project, there are multiple setups that you can integrate Hive and HBase to work together. For instance, if you use AWS, you can install HBase/Hive on the same hadoop cluster to run join queries on Hive table and Hbase table together. Or you can separate HBase and Hive into two different clusters and reference HBase data from your Hive queries. If you use Cloudera distribution, you can do the same thing too.
Reference:
- http://aws.typepad.com/aws/2012/06/apache-hbase-on-emr.html
- http://www.cloudera.com/blog/2010/06/integrating-hive-and-hbase
来源:https://stackoverflow.com/questions/9919638/continuous-data-migration-from-mysql-to-hbase