-put
and -copyFromLocal
are documented as identical, while most examples use the verbose variant -copyFromLocal. Why?
Same thing for
copyFromLocal
is restricted to copy from local while put
can take file from any (other hdfs/local filesystem/..)So, basically you can do with put, all that you do with copyFromLocal, but not vice-versa.
Similarly,
Hence, you can use get instead of copyToLocal, but not the other way round.
Reference: Hadoop's documentation.
Update: For the latest as of Oct 2015, please see this answer below.
Despite what is claimed by the documentation, as of now (Oct. 2015), both -copyFromLocal
and -put
are the same.
From the online help:
[cloudera@quickstart ~]$ hdfs dfs -help copyFromLocal
-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst> :
Identical to the -put command.
And this is confirmed by looking at the sources, where you can see that the CopyFromLocal class extends the Put class, but without adding any new behavior:
public static class CopyFromLocal extends Put {
public static final String NAME = "copyFromLocal";
public static final String USAGE = Put.USAGE;
public static final String DESCRIPTION = "Identical to the -put command.";
}
public static class CopyToLocal extends Get {
public static final String NAME = "copyToLocal";
public static final String USAGE = Get.USAGE;
public static final String DESCRIPTION = "Identical to the -get command.";
}
As you might notice it, this is exactly the same for get
/copyToLocal
.
Both 'put' & 'copyFromLocal' commands work exactly the same. You cannot use 'put' command to copy files from one hdfs directory to another. Lets see this with an example: say your root has two directories, named 'test1' and 'test2'. If 'test1' contains a file 'customer.txt' and you try copying it to test2 directory
$ hadoop fs -put /test1/customer.txt /test2
It will result in 'no such file or directory' error since 'put' will look for the file in the local file system and not hdfs.
They are both meant to copy files (or directories) from local file system to hdfs, only.
Let's make an example:
If your HDFS contains the path: /tmp/dir/abc.txt
And if your local disk also contains this path then the hdfs API won't know which one you mean, unless you specify a scheme like file://
or hdfs://
. Maybe it picks the path you did not want to copy.
Therefore you have -copyFromLocal
which is preventing you from accidentally copying the wrong file, by limiting the parameter you give to the local filesystem.
Put
is for more advanced users who know which scheme to put in front.
It is always a bit confusing to new Hadoop users which filesystem they are currently in and where their files actually are.