Which hadoop version should I choose among 1.x, 2.2 and 0.23

≡放荡痞女 提交于 2019-12-18 13:48:16

问题


Hello I am new to Hadoop and pretty confused with the version names and which one should I use among 1.x ( great support and learning resources ), 2.2 or 0.23.

I have read that hadoop is moving to YARN completely from v0.23 ( link1 ).
But at the same time its all over the web that hadoop v2.0 is moving to YARN ( link2 ) and I can see the YARN configuration files in Hadoop 2.2 itself.

  • But since 0.23 seems to be the latest version to me, Does 2.2 also support YARN ? ( Refer link 1, it says hadoop will support YARN from v0.23 )
  • And as a beginner which version should I go for 1.x or 2.x for learning perspective of hadoop.
  • Are other technologies that works with hadoop like pig, hive etc. available with the latest version of hadoop?

Thanks.

UPDATE
Thankyou all for replying. I ended up using hadoop2.2 and since all famous tutorials and resources are outdated, though I found one good book to get started with v2.2.

"Hadoop: The Definitive Guide, Third Edition" by Tom White (Buy Here)

supports hadoop v2.2.

The source code is give on github https://github.com/tomwhite/hadoop-book

as mentioned on github, the code of the book is tested with

This version of the code has been tested with:
 * Hadoop 1.2.1/0.22.0/0.23.x/2.2.0
 * Avro 1.5.4
 * Pig 0.9.1
 * Hive 0.8.0
 * HBase 0.90.4/0.94.15
 * ZooKeeper 3.4.2
 * Sqoop 1.4.0-incubating
 * MRUnit 0.8.0-incubating

hope it helps..!!!


回答1:


There are a few active release series. The 1.x release series is a continuation of the 0.20 release series. A few weeks after 0.23 released, the 0.20 branch formerly known as 0.20.205 was renumbered 1.0. There is next to no functional difference between 0.20.205 and 1.0. This is just a renumbering.

The 0.23 includes several major new features includes a new MapReduce runtime, called MapReduce 2, implemented on a new system called YARN (Yet Another Resource Negotiator), which is a general resource management system for running distributed applications. Similarly, 2.x release is a continuation of the 0.23 release series. So the 2.2 also support YARN.

According to Hadoop 2.2 release note

  • 1.2.X - current stable version, 1.2 release

  • 2.2.X - current stable 2.x version

  • 0.23.X - similar to 2.X.X but missing NN HA.

I would suggest starting with Cloudera distribution since you just start learning. The CDH 4.5 includes the YARN feature you are looking for. You can also try HortonWorks distribution. The advantage of going with these vendors is that you do not need to worry about which version of components such as Hive, Pig to work with your Hadoop installation.




回答2:


I recommended you to start with hadoop-2.2.0 which gives good knowledge. Industry prefers YARN itself and in production 2.x only exists



来源:https://stackoverflow.com/questions/21858784/which-hadoop-version-should-i-choose-among-1-x-2-2-and-0-23

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!