Hello I am new to Hadoop and pretty confused with the version names and which one should I use among 1.x ( great support and learning resources ), 2.2 or 0.23.
I hav
I recommended you to start with hadoop-2.2.0 which gives good knowledge. Industry prefers YARN itself and in production 2.x only exists
There are a few active release series. The 1.x release series is a continuation of the 0.20 release series. A few weeks after 0.23 released, the 0.20 branch formerly known as 0.20.205 was renumbered 1.0. There is next to no functional difference between 0.20.205 and 1.0. This is just a renumbering.
The 0.23 includes several major new features includes a new MapReduce runtime, called MapReduce 2, implemented on a new system called YARN (Yet Another Resource Negotiator), which is a general resource management system for running distributed applications. Similarly, 2.x release is a continuation of the 0.23 release series. So the 2.2 also support YARN.
According to Hadoop 2.2 release note
1.2.X - current stable version, 1.2 release
2.2.X - current stable 2.x version
0.23.X - similar to 2.X.X but missing NN HA.
I would suggest starting with Cloudera distribution since you just start learning. The CDH 4.5 includes the YARN feature you are looking for. You can also try HortonWorks distribution. The advantage of going with these vendors is that you do not need to worry about which version of components such as Hive, Pig to work with your Hadoop installation.