how to join tables in hbase

后端 未结 1 475
一个人的身影
一个人的身影 2020-12-29 13:03

I have to join tables in Hbase.

I integrated HIVE and HBase and that is working well. I can query using HIVE.

But can somebody help me how to join tables in

相关标签:
1条回答
  • 2020-12-29 14:00

    That is certainly an approach, but if you are doing 2 random reads per scanned row then your speed will plummet. If you are filtering the rows out significantly or have a small dataset in A that may not be an issue.

    Sort-merge Join

    However the best approach, which will be available in HBase 0.96, is the MultipleTableInput method. This means that it will scan table A and write it's output with a unique key that will allow table B to match up.

    E.g. Table A emits (b_id, a_info) and Table B will emit (b_id, b_info) merging together in the reducer.

    This is an example of a sort-merge join.

    Nested-Loop Join

    If you are joining on the row key or the joining attribute is sorted in line with table B, you can have a instance of a scanner in each task which sequentially reads from table B until it finds what it's looking for.

    E.g. Table A row key = "companyId" and Table B row key = "companyId_employeeId". Then for each Company in Table A you can get all the employees using the nest-loop algorithm.

    Pseudocode:

    for(company in TableA):
        for(employee in TableB):
            if employee.company_id == company.id:
                emit(company.id, employee)
    

    This is an example of a nest-loop join.

    More detailed join algorithms are here:

    • http://en.wikipedia.org/wiki/Nested_loop_join
    • http://en.wikipedia.org/wiki/Hash_join
    • http://en.wikipedia.org/wiki/Sort-merge_join
    0 讨论(0)
提交回复
热议问题