问题
I am using cloudera quick start edition CDH 5.7
I used below query on terminal window:
sqoop import \
--connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \
--username=retail_dba \
--password=cloudera \
--query="select * from orders join order_items on orders.order_id = order_items.order_item_order_id where \$CONDITIONS" \
--target-dir /user/cloudera/order_join \
--split-by order_id \
--num-mappers 4
Q: What is the purpose of the $CONDITIONS ? Why used in this query ? Can anybody can explain to me.
回答1:
$CONDITIONS
is used internally by sqoop to modify query to achieve task splitting and fetching metadata.
To fetch metadata, sqoop replaces \$CONDITIONS
with 1= 0
select * from table where 1 = 0
To fetch all data (1 mapper), sqoop replaces \$CONDITIONS
with 1= 1
select * from table where 1 = 1
In the case of multiple mappers, sqoop replaces \$CONDITIONS
with range query to fetch a subset of data from RDBMS.
For example, id
lies between 1 to 100 and we are using 4 mappers.
Select * From table WHERE id >= 1' AND 'id < 25
Select * From table WHERE id >= 25' AND 'id < 50
Select * From table WHERE id >= 50' AND 'id < 75
Select * From table WHERE id >= 75' AND 'id <= 100
来源:https://stackoverflow.com/questions/42330986/what-is-the-purpose-of-conditions-under-query