replication-factor

hdfs put/moveFromLocal not distributing data across data nodes?

跟風遠走 提交于 2020-01-16 08:06:24
问题 I found similar question Hadoop HDFS is not distributing blocks of data evenly but my ask is when replication factor = 1 I still want to understand why HDFS is not evenly distributing file blocks across the cluster nodes? This will result in data skew from start, when I load/run dataframe ops on such files. Am I missing something? 回答1: Even if replication factor is one, files are still split and stored in multiples of the HDFS block size. Block placement is on best effort, AFAIK, not purely