Sqoop Incremental Import

后端 未结 8 1606
别那么骄傲
别那么骄傲 2021-01-30 15:27

Need advice on Sqoop Incremental Imports. Say I have a Customer with Policy 1 on Day 1 and I imported those records in HDFS on Day 1 and I see them in Part Files.
On Day 2,

8条回答
  •  一个人的身影
    2021-01-30 15:54

    Consider a table with 3 records which you already imported to hdfs using sqoop

    +------+------------+----------+------+------------+
    | sid  | city       | state    | rank | rDate      |
    +------+------------+----------+------+------------+
    |  101 | Chicago    | Illinois |    1 | 2014-01-25 |
    |  101 | Schaumburg | Illinois |    3 | 2014-01-25 |
    |  101 | Columbus   | Ohio     |    7 | 2014-01-25 |
    +------+------------+----------+------+------------+
    
    sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P
    

    Now you have additional records in the table but no updates on existing records

    +------+------------+----------+------+------------+
    | sid  | city       | state    | rank | rDate    |
    +------+------------+----------+------+------------+
    |  101 | Chicago    | Illinois |    1 | 2014-01-25 |
    |  101 | Schaumburg | Illinois |    3 | 2014-01-25 |
    |  101 | Columbus   | Ohio     |    7 | 2014-01-25 |
    |  103 | Charlotte  | NC       |    9 | 2013-04-22 |
    |  103 | Greenville | SC       |    9 | 2013-05-12 |
    |  103 | Atlanta    | GA       |   11 | 2013-08-21 |
    +------+------------+----------+------+------------+
    

    Here you should use an --incremental append with --check-column which specifies the column to be examined when determining which rows to import.

    sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P --check-column rank --incremental append --last-value 7
    

    The above code will insert all the new rows based on the last value.

    Now we can think of second case where there are updates in rows

    +------+------------+----------+------+------------+
    | sid  | city       | state    | rank | rDate      |
    +------+------------+----------+------+------------+
    |  101 | Chicago    | Illinois |    1 | 2015-01-01 |
    |  101 | Schaumburg | Illinois |    3 | 2014-01-25 |
    |  101 | Columbus   | Ohio     |    7 | 2014-01-25 |
    |  103 | Charlotte  | NC       |    9 | 2013-04-22 |
    |  103 | Greenville | SC       |    9 | 2013-05-12 |
    |  103 | Atlanta    | GA       |   11 | 2013-08-21 |
    |  104 | Dallas     | Texas    |    4 | 2015-02-02 |
    |  105 | Phoenix    | Arzona   |   17 | 2015-02-24 |
    +------+------------+----------+------+------------+
    

    Here we use incremental lastmodified where we will fetch all the updated rows based on date.

    sqoop import --connect jdbc:mysql://localhost:3306/ydb --table yloc --username root -P   --check-column rDate --incremental lastmodified --last-value 2014-01-25 --target-dir yloc/loc
    

提交回复
热议问题