Need advice on Sqoop Incremental Imports.
Say I have a Customer with Policy 1 on Day 1 and I imported those records in HDFS on Day 1 and I see them in Part Files.
On Day 2,
There are already great responses here. Along with these you could also try Sqoop Query Approach. You can customize your query based on the condition to retrieve the updated records.
Example 1:
$ sqoop import \ --query 'SELECT a., b. FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \ --split-by a.id --target-dir /tmp/MyNewloc
Example 2:
sqoop import --connect "jdbc:jtds:sqlserver://MYPD22:1333;databaseName=myDb" --target-dir /tmp/MyNewloc --fields-terminated-by \| --username xxx --password='xxx' --query "select * from Policy_Table where Policy_ID > 1 AND \$CONDITIONS" -m1
Don't forget to supply $CONDITIONS in the Where Clause.
Please Refer Sqoop Free Form Import
You could do this using 2 methods.
Method 1 - Using Sqoop Merge
Method 2 - Copying newly generated part-m files into original table target directory. (Copy part-m files from /tmp/MyNewloc to /tmp/MyOriginalLoc/)
1) Now crate a hive table using Location as original table target directory which contains both original part-m files and new records part-m files.
CREATE EXTERNAL TABLE IF NOT EXISTS Policy_Table(
Policy_ID string,
Customer_Name string
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
STORED AS TEXTFILE
LOCATION '/tmp/MyOriginalLoc/';