Observing duplicates using sqoop with Oozie

后端 未结 1 1029
情话喂你
情话喂你 2021-01-26 03:00

I\'ve built a sqoop pogram in order to import data from MySQL to HDFS using a pre-built sqoop job:

                sqoop job -fs $driver_path -D mapreduce.map.ja         


        
1条回答
  •  迷失自我
    2021-01-26 03:30

    Ask yourself a question: where does Sqoop store that "last value" information?

    The answer is: for Sqoop1, by default, in a file on the local filesystem. But Oozie runs your Sqoop job on random machines therefore the executions are not coordinated.
    And Sqoop2 (which has a proper Metastore database) is more or less in limbo; at least it is not supported by Oozie.

    The solution is to start a shared HSQLDB database service to store the "last value" information for all Sqoop1 jobs, whatever machine they are running on.

    Please read the Sqoop1 documentation about its lame Metastore and about how to use it, from there to there.
    And for a more professional handling of that obsolete HSQLDB database, look at that post of mine.

    0 讨论(0)
提交回复
热议问题