问题 I found similar question Hadoop HDFS is not distributing blocks of data evenly but my ask is when replication factor = 1 I still want to understand why HDFS is not evenly distributing file blocks across the cluster nodes? This will result in data skew from start, when I load/run dataframe ops on such files. Am I missing something? 回答1: Even if replication factor is one, files are still split and stored in multiples of the HDFS block size. Block placement is on best effort, AFAIK, not purely