how to partition graph for pregel to maximize processing speed?

问题

I have a crowdsourcing application. data from users is collected and then processed and then updated for everyone to see. The data collection is almost real time. The processing speed is increasing as the users (data nodes) are increasing. I need to scale this.

Looking at scaling for graph based models, mapreduce seems to be famous. Is there a benchmarking paper comparing it to other techniques? Pregel is impressive. Please point me to any leads about 'partitioning' in pregel i.e, how a graph can be partitioned intelligently so as to minimize processes lagging behind each other.

回答1:

The problem of partitioning a graph 'intelligently' in order to minimize execution time is an interesting one, however it's not simple and it depends on your data and your algorithm. You might find also that, in practice, it's not necessary and a random partitioning is sufficiently good.

For example, if you are interested in exploring Pregel-like approaches, you can have a look at Apache Giraph and experiment with different partitioning techniques.

来源：https://stackoverflow.com/questions/9583296/how-to-partition-graph-for-pregel-to-maximize-processing-speed

标签

language-agnostic

scaling

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!