Sqoop Incremental Import

后端 未结 8 1580
别那么骄傲
别那么骄傲 2021-01-30 15:27

Need advice on Sqoop Incremental Imports. Say I have a Customer with Policy 1 on Day 1 and I imported those records in HDFS on Day 1 and I see them in Part Files.
On Day 2,

8条回答
  •  伪装坚强ぢ
    2021-01-30 16:07

    There are already great responses here. Along with these you could also try Sqoop Query Approach. You can customize your query based on the condition to retrieve the updated records.

    STEP 1: Importing New Records from the Database Table:

    Example 1:

    $ sqoop import \ --query 'SELECT a., b. FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \ --split-by a.id --target-dir /tmp/MyNewloc

    Example 2:

    sqoop import --connect "jdbc:jtds:sqlserver://MYPD22:1333;databaseName=myDb"   --target-dir /tmp/MyNewloc --fields-terminated-by \| --username xxx --password='xxx' --query "select * from Policy_Table where Policy_ID > 1 AND \$CONDITIONS"  -m1 
    

    Don't forget to supply $CONDITIONS in the Where Clause.

    Please Refer Sqoop Free Form Import

    STEP 2: Merging part-m files of both base table (original data) & New Table (New Records)

    You could do this using 2 methods.

    Method 1 - Using Sqoop Merge

    Method 2 - Copying newly generated part-m files into original table target directory. (Copy part-m files from /tmp/MyNewloc to /tmp/MyOriginalLoc/)

    STEP 3: CREATING HIVE TABLE

    1) Now crate a hive table using Location as original table target directory which contains both original part-m files and new records part-m files.

    CREATE  EXTERNAL TABLE IF NOT EXISTS Policy_Table(
    Policy_ID string,
    Customer_Name string
    )
    ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
    STORED AS TEXTFILE
    LOCATION '/tmp/MyOriginalLoc/';
    

提交回复
热议问题