Hadoop release version confusing

我与影子孤独终老i 提交于 2019-12-11 03:16:21

问题


I am trying to figure out the different versions of hadoop and I got confusing after reading this page.

Download
1.2.X - current stable version, 1.2 release
2.2.X - current stable 2.x version
2.3.X - current 2.x version
0.23.X - similar to 2.X.X but missing NN HA.
Releases may be downloaded from Apache mirrors.

Question:

  1. I think any release starting with 0.xx means it is a alpha version and should be not used in product, is that the case?
  2. What is the difference between 0.23.X and 2.3.X? it mentioned they are similar but missing namenode? high availability? is there any correlation between 0.23 and 2.3? Is it because when they develop the code, the PMC group say "man! it is so immature and should let it start with 0, since they are the same product, I will keep the digits the same?"
  3. When I look at the source code of the new hadoop, I see the jobtracker class turned out to be a dummy class. And I am envisioning the jobtracker and tasktracker, ie. Mapreduce1 will slowly fade away on the roadmap of Hadoop, which in another case, the interface for the Map Reduce Job might keep the same, but the second generation of Hadoop (YARN) will totally replace the idea of Jobtracker and Tasktracker with ResourceManager..etc.

Sorry that this question might be a bit unorganized since I got really confused by the version number. I will modify the question after I figured it out.


回答1:


First of all: there's a major difference between Hadoop v1 and v2 (aka YARN). The v1's NameNode and JobTracker are replaced by the new ResourceManager for better scalability. That's why both will disappear later on in the development.

Second: 0.X versions are subtle no hint for alpha releases: OpenSSL was over ten years a 0.9 release (en.wikipedia.org/wiki/OpenSSL#Major_version_releases) even though it was considered being a de facto standard or reference implementation. And many Fortune 500 companies trusted in it.

And that's true for Hadoop as well. The 0.23 version refers to Hadoop v1's architecture that has v2 implementations (except High Availability as the NameNode is still v1's). So 0.23 and 2.3 are about the same and continue aging in parallel. They named it 0.X as 1.X is already in use. They just don't wanted 1.X keep aging to indicate that 2.X is the way to go -- you may use 0.X only if you rely on 1.X's architecture but on the other hand want to receive minor improvements from the current development in 2.X.

The bottom part tries to explain this, but is a bit better skelter as well: http://wiki.apache.org/hadoop/Roadmap. The top part here does it a bit better: http://hadoop.apache.org/releases.html

Hope this was helpful...




回答2:


From the image below you can notice that Hadoop 2.6.2 has been released after 2.71

Reasoning 2.6 to 2.6.2 is a MINOR API update and IS backward compatible.

2.6 to 2.7 is a MAJOR API update EG IS NOT backward compatible. Some API's may now be obsolete.

Ref Hadoop Road map



来源:https://stackoverflow.com/questions/23787587/hadoop-release-version-confusing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!