Append data to existing file in HDFS Java

后端 未结 3 437
隐瞒了意图╮
隐瞒了意图╮ 2020-12-05 02:52

I\'m having trouble to append data to an existing file in HDFS. I want that if the file exists then append a line, if not, create a new file with the name given.

Her

相关标签:
3条回答
  • 2020-12-05 03:46

    HDFS does not allow append operations. One way to implement the same functionality as appending is:

    • Check if file exists.
    • If file doesn't exist, then create new file & write to new file
    • If file exists, create a temporary file.
    • Read line from original file & write that same line to temporary file (don't forget the newline)
    • Write the lines you want to append to the temporary file.
    • Finally, delete the original file & move(rename) the temporary file to the original file.
    0 讨论(0)
  • 2020-12-05 03:47

    Solved..!!

    Append is supported in HDFS.

    You just have to do some configurations and simple code as shown below :

    Step 1: set dfs.support.append as true in hdfs-site.xml :

    <property>
       <name>dfs.support.append</name>
       <value>true</value>
    </property>
    

    Stop all your daemon services using stop-all.sh and restart it again using start-all.sh

    Step 2 (Optional): Only If you have a singlenode cluster , so you have to set replication factor to 1 as below :

    Through command line :

    ./hdfs dfs -setrep -R 1 filepath/directory
    

    Or you can do the same at run time through java code:

    fsShell.setrepr((short) 1, filePath);  
    

    Step 3 : Code for Creating/appending data into the file :

    public void createAppendHDFS() throws IOException {
        Configuration hadoopConfig = new Configuration();
        hadoopConfig.set("fs.defaultFS", hdfsuri);
        FileSystem fileSystem = FileSystem.get(hadoopConfig);
        String filePath = "/test/doc.txt";
        Path hdfsPath = new Path(filePath);
        fShell.setrepr((short) 1, filePath); 
        FSDataOutputStream fileOutputStream = null;
        try {
            if (fileSystem.exists(hdfsPath)) {
                fileOutputStream = fileSystem.append(hdfsPath);
                fileOutputStream.writeBytes("appending into file. \n");
            } else {
                fileOutputStream = fileSystem.create(hdfsPath);
                fileOutputStream.writeBytes("creating and writing into file\n");
            }
        } finally {
            if (fileSystem != null) {
                fileSystem.close();
            }
            if (fileOutputStream != null) {
                fileOutputStream.close();
            }
        }
    }
    

    Kindly let me know for any other help.

    Cheers.!!

    0 讨论(0)
  • 2020-12-05 03:53

    Actually, you can append to a HDFS file:

    From the perspective of Client, append operation firstly calls append of DistributedFileSystem, this operation would return a stream object FSDataOutputStream out. If Client needs to append data to this file, it could calls out.write to write, and calls out.close to close.

    I checked HDFS sources, there is DistributedFileSystem#append method:

     FSDataOutputStream append(Path f, final int bufferSize, final Progressable progress) throws IOException
    

    For details, see presentation.

    Also you can append through command line:

    hdfs dfs -appendToFile <localsrc> ... <dst>
    

    Add lines directly from stdin:

    echo "Line-to-add" | hdfs dfs -appendToFile - <dst>
    
    0 讨论(0)
提交回复
热议问题