How to run external program within mapper or reducer giving HDFS files as input and storing output files in HDFS?

后端 未结 2 1783
南旧
南旧 2021-01-24 12:10

I have a external program which take file as a input and give output file

     //for example 
     input file: IN_FILE
     output file: OUT_FILE

    //Run Exte         


        
2条回答
  •  佛祖请我去吃肉
    2021-01-24 12:53

    So assuming that your external program doesnt know how to recognize or read from hdfs, then what you will want to do is load in the file from java and pass it as input directly to the program

    Path path = new Path("hdfs/path/to/input/file");
    FileSystem fs = FileSystem.get(configuration);
    FSDataInputStream fin = fs.open(path);
    ProcessBuilder pb = new ProcessBuilder("SHELL_SCRIPT");
    Process p = pb.start();
    OutputStream os = p.getOutputStream();
    BufferedReader br = new BufferedReader(new InputStreamReader(fin));
    BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(os));
    
    String line = null;
    while ((line = br.readLine())!=null){
        writer.write(line);
    }
    

    The output can be done in the reverse manner. Get the InputStream from the process, and make a FSDataOutputStream to write to the hdfs.

    Essentially your program with these two things becomes an adapter that converts HDFS to input and output back into HDFS.

提交回复
热议问题