how to prevent hadoop job to fail on corrupted input file

后端 未结 3 1791
滥情空心
滥情空心 2021-02-09 02:31

I\'m running hadoop job on many input files. But if one of the files is corrupted the whole job is fails.

How can I make the job to ignore the corrupted file? maybe writ

3条回答
  •  死守一世寂寞
    2021-02-09 03:11

    This is what Failure Traps are used for in cascading:

    Whenever an operation fails and throws an exception, if there is an associated trap, the offending Tuple is saved to the resource specified by the trap Tap. This allows the job to continue processing without any data loss.

    This will essentially let your job continue and let you check your corrupt files later

    If you are somewhat familiar with cascading in your flow definition statement:

        new FlowDef().addTrap( String branchName, Tap trap );
    

    Failure Traps

提交回复
热议问题