How can I concatenate two files in hadoop into one using Hadoop FS shell?

前端 未结 2 1812
盖世英雄少女心
盖世英雄少女心 2021-01-12 16:14

I am working with Hadoop 0.20.2 and would like to concatenate two files into one using the -cat shell command if possible (source: http://hadoop.apache.org/common/docs/r0.19

2条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-12 16:27

    To concatenate all files in the folder to an output file:

    hadoop fs -cat myfolder/* | hadoop fs -put - myfolder/output.txt
    

    If you have multiple folders on hdfs and you want to concatenate files in each of those folders, you can use a shell script to do this. (note: this is not very effective and can be slow)

    Syntax :

    for i in `hadoop fs -ls | cut -d' ' -f19` ;do `hadoop fs -cat $i/* | suy hadoop fs -put - $i/`; done
    

    eg:

    for i in `hadoop fs -ls my-job-folder | cut -d' ' -f19` ;do `hadoop fs -cat $i/* |hadoop fs -put - $i/output.csv`; done
    

    Explanation: So you basically loop over all the files and cat each of the folders contents into an output file on the hdfs.

提交回复
热议问题