问题
Hi I am working on One of my project which I have created VM of 5 Machine and it is working fine in development environment but I have some confusion regarding VM cluster is good or need to go with physical system cluster.
回答1:
Hadoop was developed for physical systems but it will function with varying degrees of success in virtual environments, it depends on the specific environment.
This is actually quite a common question on the hadoop mailing lists and was specifically addressed by the Hadoop developers on the Hadoop Wiki article: Virtual Hadoop. The article covers the strengths/weaknesses of each and talks about cloud deployments as well. You should read this article and see which deployment scenario you fall in to and assess what issues you may potentially have in your VM setup.
回答2:
If you are going to use virtual machines for HDFS, be careful with replication. By default, HDFS stores all data in 64mb chunks and replicates every chunk to 3 different nodes. Also, at least one replica should be located in a different physical rack - see "rack awareness" feature. If all 3 replicas of your virtualized data happen to be located on a single physical host/HDD, you may run into a problem in a case of hardware failure.
Another potential caveat may be I/O performance. If you are using file-based disk image for VM and not direct disc access, the I/O overhead may be considerable. Also, in many cases, virtual machines can not take advantage of filesystem cache on a physical host.
On the other hand, VM management and provisioning should be easier.
Otherwise, refer to https://stackoverflow.com/a/44355754/1421254 answer.
来源:https://stackoverflow.com/questions/44344195/which-one-is-best-apache-ambari-cluster-on-physical-system-with-5-machine-or-in