How to use Cassandra's Map Reduce with or w/o Pig?

后端 未结 3 1369
抹茶落季
抹茶落季 2021-02-13 05:31

Can someone explain how MapReduce works with Cassandra .6? I\'ve read through the word count example, but I don\'t quite follow what\'s happening on the Cassandra end vs. the \"

3条回答
  •  南旧
    南旧 (楼主)
    2021-02-13 06:11

    The win of using a direct InputFormat from cassandra is that it streams the data efficiently, which is a very big win. Each input split covers a range of tokens and rolls off the disk at its full bandwidth: no seeking, no complex querying. I don't think it knows about locality -- to have each tasktracker prefer input splits from a cassandra process on the same node.

    You can try using Pig with the STREAM method as a hack until more direct hadoop streaming support is in place.

提交回复
热议问题