发表新帖

发表新帖

Join of two datasets in Mapreduce/Hadoop

前端未结

关注

 2  1028

温柔的废话 2021-02-06 00:20

Does anyone know how to implement the Natural-Join operation between two datasets in Hadoop?

More specifically, here\'s what I exactly need to do:

I am having t

2条回答

臣服心动 (楼主)

2021-02-06 00:49

So basically you have two options here.Reduce side join or Map Side Join .

Here your group key is "tile". In a single reducer you are going to get all the output from point pair and line pair. But you you will have to either cache point pair or line pair in the array. If either of the pairs(point or line) are very large that neither can fit in your temporary array memory for single group key(each unique tile) then this method will not work for you. Remember you don't have to hold both of key pairs for single group key("tile") in memory, one will be sufficient.

If both key pairs for single group key are large , then you will have to try map-side join.But it has some peculiar requirements. However you can fulfill those requirement by doing some pre-processing your data through some map/reduce jobs running equal number of reducers for both data.

0 讨论(0)

查看其它2个回答
发布评论:

提交评论
- 加载中...

热议问题