I\'m having trouble to append data to an existing file in HDFS. I want that if the file exists then append a line, if not, create a new file with the name given.
Her
HDFS does not allow append
operations. One way to implement the same functionality as appending is:
Solved..!!
Append is supported in HDFS.
You just have to do some configurations and simple code as shown below :
Step 1: set dfs.support.append as true in hdfs-site.xml :
<property>
<name>dfs.support.append</name>
<value>true</value>
</property>
Stop all your daemon services using stop-all.sh and restart it again using start-all.sh
Step 2 (Optional): Only If you have a singlenode cluster , so you have to set replication factor to 1 as below :
Through command line :
./hdfs dfs -setrep -R 1 filepath/directory
Or you can do the same at run time through java code:
fsShell.setrepr((short) 1, filePath);
Step 3 : Code for Creating/appending data into the file :
public void createAppendHDFS() throws IOException {
Configuration hadoopConfig = new Configuration();
hadoopConfig.set("fs.defaultFS", hdfsuri);
FileSystem fileSystem = FileSystem.get(hadoopConfig);
String filePath = "/test/doc.txt";
Path hdfsPath = new Path(filePath);
fShell.setrepr((short) 1, filePath);
FSDataOutputStream fileOutputStream = null;
try {
if (fileSystem.exists(hdfsPath)) {
fileOutputStream = fileSystem.append(hdfsPath);
fileOutputStream.writeBytes("appending into file. \n");
} else {
fileOutputStream = fileSystem.create(hdfsPath);
fileOutputStream.writeBytes("creating and writing into file\n");
}
} finally {
if (fileSystem != null) {
fileSystem.close();
}
if (fileOutputStream != null) {
fileOutputStream.close();
}
}
}
Kindly let me know for any other help.
Cheers.!!
Actually, you can append to a HDFS file:
From the perspective of Client, append operation firstly calls append of DistributedFileSystem, this operation would return a stream object FSDataOutputStream out. If Client needs to append data to this file, it could calls out.write to write, and calls out.close to close.
I checked HDFS sources, there is DistributedFileSystem#append
method:
FSDataOutputStream append(Path f, final int bufferSize, final Progressable progress) throws IOException
For details, see presentation.
Also you can append through command line:
hdfs dfs -appendToFile <localsrc> ... <dst>
Add lines directly from stdin:
echo "Line-to-add" | hdfs dfs -appendToFile - <dst>