I have two HDFS setup and want to copy (not migrate or move) some tables from HDFS1 to HDFS2. How to copy data from one HDFS to another HDFS? Is it possible via Sqoop or other c
distcp
is used for copying data to and from the hadoop filesystems in parallel. It is similar to the generic hadoop fs -cp
command. In the background process, distcp
is implemented as a MapReduce job where mappers are only implemented for copying in parallel across the cluster.
Usage:
copy one file to another
% hadoop distcp file1 file2
copy directories from one location to another
% hadoop distcp dir1 dir2
If dir2
doesn't exist then it will create that folder and copy the contents. If dir2
already exists, then dir1
will be copied under it. -overwrite
option forces the files to be overwritten within the same folder. -update
option updates only the files that are changed.
transferring data between two HDFS clusters
% hadoop distcp -update -delete hdfs://nn1/dir1 hdfs://nn2/dir2
-delete
option deletes the files or directories from the destination that are not present in the source.