Pig vs Hive vs Native Map Reduce

后端 未结 7 2095
无人及你
无人及你 2020-12-14 01:55

I\'ve basic understanding on what Pig, Hive abstractions are. But I don\'t have a clear idea on the scenarios that require Hive, Pig or native map reduce.

I went thr

7条回答
  •  时光说笑
    2020-12-14 02:35

    Scenarios where Hadoop Map Reduce is preferred to Hive or PIG

    1. When you need definite driver program control

    2. Whenever the job requires implementing a custom Partitioner

    3. If there already exists pre-defined library of Java Mappers or Reducers for a job

    4. If you require good amount of testability when combining lots of large data sets
    5. If the application demands legacy code requirements that command physical structure
    6. If the job requires optimization at a particular stage of processing by making the best use of tricks like in-mapper combining
    7. If the job has some tricky usage of distributed cache (replicated join), cross products, groupings or joins

    Pros of Pig/Hive :

    1. Hadoop MapReduce requires more development effort than Pig and Hive.
    2. Pig and Hive coding approaches are slower than a fully tuned Hadoop MapReduce program.
    3. When using Pig and Hive for executing jobs, Hadoop developers need not worry about any version mismatch.
    4. There is very limited possibility for the developer to write java level bugs when coding in Pig or Hive.

    Have a look at this post for Pig Vs Hive comparison.

提交回复
热议问题