Is it possible to append to HDFS file from multiple clients in parallel?

时光总嘲笑我的痴心妄想 提交于 2019-11-28 10:45:18

I don't think that this is possible with HDFS. Even though you don't care about the order of the records, you do care about the order of the bytes in the file. You don't want writer A to write a partial record that then gets corrupted by writer B. This is a hard problem for HDFS to solve on its own, so it doesn't.

Create a file per writer. Pass all the files to any MapReduce worker that needs to read this data. This is much simpler and fits the design of HDFS and Hadoop. If non-MapReduce code needs to read this data as one stream then either stream each file sequentially or write a very quick MapReduce job to consolidate the files.

just FYI, probably it'd be fully supported in hadoop 2.6.x, acorrding to the JIRA item on the official site: https://issues.apache.org/jira/browse/HDFS-7203

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!