I get multiple small files into my input directory which I want to merge into a single file without using the local file system or writing mapreds. Is there a way I could do
You can use the tool HDFSConcat, new in HDFS 0.21, to perform this operation without incurring the cost of a copy.
Addressing this from Apache Pig perspective,
To merge two files with identical schema via Pig, UNION command can be used
A = load 'tmp/file1' Using PigStorage('\t') as ....(schema1)
B = load 'tmp/file2' Using PigStorage('\t') as ....(schema1)
C = UNION A,B
store C into 'tmp/fileoutput' Using PigStorage('\t')