hdfs put/moveFromLocal not distributing data across data nodes?

跟風遠走 提交于 2020-01-16 08:06:24

问题


I found similar question Hadoop HDFS is not distributing blocks of data evenly

but my ask is when replication factor = 1

I still want to understand why HDFS is not evenly distributing file blocks across the cluster nodes? This will result in data skew from start, when I load/run dataframe ops on such files. Am I missing something?


回答1:


Even if replication factor is one, files are still split and stored in multiples of the HDFS block size. Block placement is on best effort, AFAIK, not purely balanced; replication placement of 3 picks a random node, then another node on the same rack, then another node off rack at random

You'll need to clarify how large your files are and where you are looking to see if data is being split

Note: not all file formats are splittable



来源:https://stackoverflow.com/questions/59363801/hdfs-put-movefromlocal-not-distributing-data-across-data-nodes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!