I am using Sqoop (version 1.4.4) to import data from MySQL to Hive. The data will be a subset of one of tables, i.e. few columns from a table. Is it necessary to create table in
Firstly, one doesn't have to create an EXTERNAL table it works otherwise, secondly, the solutions given above are bit complex.
Suppose mysql schema looks like this
mysql> describe emp;
+--------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+-------------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| name | varchar(20) | YES | | NULL | |
| deg | varchar(20) | YES | | NULL | |
| salary | int(11) | YES | | NULL | |
| dept | varchar(20) | YES | | NULL | |
+--------+-------------+------+-----+---------+-------+
Then one needs to create hive table as I did, DATABASE as userdb and TABLE as emp
hive>
CREATE TABLE userdb.emp (
id INT,
name VARCHAR(20),
deg VARCHAR(20),
salary INT,
dept VARCHAR(20))
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
STORED AS TEXTFILE;
Now it is a matter of running the sqoop script ( I had to quit from hive prompt though ) and since I am not using hive2 I had to run the below script at the location where metastore_db exist ( ie from the same working directory where I used hive). Some workaround can mitigate this problem (I guess). The sqoop script is
sqoop import \
--connect jdbc:mysql://localhost/userdb \
--username root --password root \
--table emp --fields-terminated-by ',' \
--split-by id \
--hive-import --hive-table userdb.emp \
--target-dir /emp
The target directory ie /emp gets deleted once the command succeeds. I explicitly specified the hive table using userdb.emp
My hdfs directory structure
drwxr-xr-x - ubuntu supergroup 0 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp
-rwxr-xr-x 3 ubuntu supergroup 28 2016-12-18 13:19 /user/hive/warehouse/userdb.db/emp/part-m-00000
-rwxr-xr-x 3 ubuntu supergroup 35 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00001
-rwxr-xr-x 3 ubuntu supergroup 29 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00002
-rwxr-xr-x 3 ubuntu supergroup 31 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00003
-rwxr-xr-x 3 ubuntu supergroup 28 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00004