Merging multiple files into one within Hadoop

前端 未结 8 786
遇见更好的自我
遇见更好的自我 2020-12-01 02:18

I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do

相关标签:
8条回答
  • 2020-12-01 02:54

    You can use the tool HDFSConcat, new in HDFS 0.21, to perform this operation without incurring the cost of a copy.

    0 讨论(0)
  • 2020-12-01 02:58

    Addressing this from Apache Pig perspective,

    To merge two files with identical schema via Pig, UNION command can be used

     A = load 'tmp/file1' Using PigStorage('\t') as ....(schema1)
     B = load 'tmp/file2' Using PigStorage('\t') as ....(schema1) 
     C = UNION A,B
     store C into 'tmp/fileoutput' Using PigStorage('\t')
    
    0 讨论(0)
提交回复
热议问题