Sqoop import without primary key in RDBMS

别说谁变了你拦得住时间么 提交于 2020-01-11 02:11:24

问题


Can I import RDBMS table data (table doesn't have a primary key) to hive using sqoop? If yes, then can you please give the sqoop import command.

I have tried with sqoop import general command, but it failed.


回答1:


If your table has no primary key defined then you have to give -m 1 option for importing the data or you have to provide --split-by argument with some column name, otherwise it gives the error:  

ERROR tool.ImportTool: Error during import: No primary key could be found for table <table_name>. Please specify one with --split-by or perform a sequential import with '-m 1'

then your sqoop command will look like

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    -m 1

or

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table user \
    --target-dir /user/root/user_data \
    --columns "first_name, last_name, created_date"
    --split-by created_date



回答2:


You can import data from RDBMS into hive without Primarykey.

First you need to create a table in hive.After that you need to write the following code:

sqoop import \
    --connect jdbc:mysql://localhost/test_db \
    --username root \
    --password **** \
    --table <RDBMS-Table-name> \
    --target-dir /user/root/user_data \
    --hive-import \ 
    --hive-table <hive-table-name> \
    --create-hive-table \
    -m 1 (or) --split-by <RDBMS-Column>



回答3:


In the first scenario using 1 Mapper ... If the size of the file is very large this process is going to take more time to respond or might fail. Check the size of the data before using mapper = 1 .




回答4:


Quick view:

The Sqoop job fails and the error looks like this" Error during import: No primary key could be found for the table . Please specify one with --split-by or perform a sequential import with '-m 1' "

Description: Usually, when you perform a Sqoop job internally it searches for the primary key in the table. If there is no primary key the Sqoop job fails and the error looks like this" Error during import: No primary key could be found for the table . Please specify one with --split-by or perform a sequential import with '-m 1' ". The suggestion describes there are two alternative approaches to this scenario.

Best way is option 2

  1. To specify the number of mappers as 1 (default it takes 4). So by specifying the number of mappers to 1, the task will be sequential and identical to a single threaded task. This will succeed only when you are targeting a small table if in case if you are looking for a large import this will fail as the task tends to run forever.

  2. The best approach is to use split-by where you can specify the number of mappers on the bases of indexed columns or splitting column manually( with queries ).




回答5:


Use the following in your command:

--autoreset-to-one-mapper

Import should use one mapper if a table has no primary key and no split-by column is provided. It cannot be used with --split-by <col> option.



来源:https://stackoverflow.com/questions/29138498/sqoop-import-without-primary-key-in-rdbms

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!