Using Sqoop to import data from MySQL to Hive

前端 未结 6 1983
攒了一身酷
攒了一身酷 2021-02-06 10:12

I am using Sqoop (version 1.4.4) to import data from MySQL to Hive. The data will be a subset of one of tables, i.e. few columns from a table. Is it necessary to create table in

相关标签:
6条回答
  • 2021-02-06 10:45

    Firstly, one doesn't have to create an EXTERNAL table it works otherwise, secondly, the solutions given above are bit complex.

    Suppose mysql schema looks like this

    mysql> describe emp;
    +--------+-------------+------+-----+---------+-------+
    | Field  | Type        | Null | Key | Default | Extra |
    +--------+-------------+------+-----+---------+-------+
    | id     | int(11)     | YES  |     | NULL    |       |
    | name   | varchar(20) | YES  |     | NULL    |       |
    | deg    | varchar(20) | YES  |     | NULL    |       |
    | salary | int(11)     | YES  |     | NULL    |       |
    | dept   | varchar(20) | YES  |     | NULL    |       |
    +--------+-------------+------+-----+---------+-------+
    

    Then one needs to create hive table as I did, DATABASE as userdb and TABLE as emp

    hive>
    CREATE TABLE userdb.emp (
    id  INT,
    name  VARCHAR(20),
    deg  VARCHAR(20),
    salary INT,
    dept  VARCHAR(20))
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY ','
    STORED AS TEXTFILE;
    

    Now it is a matter of running the sqoop script ( I had to quit from hive prompt though ) and since I am not using hive2 I had to run the below script at the location where metastore_db exist ( ie from the same working directory where I used hive). Some workaround can mitigate this problem (I guess). The sqoop script is

    sqoop import \ 
    --connect jdbc:mysql://localhost/userdb \
    --username root --password root \ 
    --table emp --fields-terminated-by ',' \ 
    --split-by id \ 
    --hive-import --hive-table userdb.emp \
    --target-dir /emp
    

    The target directory ie /emp gets deleted once the command succeeds. I explicitly specified the hive table using userdb.emp

    My hdfs directory structure

    drwxr-xr-x   - ubuntu supergroup          0 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp
    -rwxr-xr-x   3 ubuntu supergroup         28 2016-12-18 13:19 /user/hive/warehouse/userdb.db/emp/part-m-00000
    -rwxr-xr-x   3 ubuntu supergroup         35 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00001
    -rwxr-xr-x   3 ubuntu supergroup         29 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00002
    -rwxr-xr-x   3 ubuntu supergroup         31 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00003
    -rwxr-xr-x   3 ubuntu supergroup         28 2016-12-18 13:20 /user/hive/warehouse/userdb.db/emp/part-m-00004
    
    0 讨论(0)
  • 2021-02-06 10:46

    As mentioned in the sqoop documentation, you will not have to create any hive tables if you use the --hive-import argument in your command

    example:

    sqoop import --connect jdbc:mysql://mysql_server:3306/db_name --username mysql_user --password mysql_pass --table table_name --hive-import
    

    Also... consider the --hive-overwrite argument if you want to schedule a full data import, on a daily base for example

    0 讨论(0)
  • 2021-02-06 10:48

    I finally resolved the issue. It would involve two steps.

    1. Create an external hive table.
    2. Import data using Sqoop.

    Creation of External table : External tables in hive are kind of permanent tables and stays there even if hive is stopped or server goes down. "EXTERNAL" keyword is used to specify table type.

    CREATE EXTERNAL TABLE IF NOT EXISTS HIVEDB.HIVE_TABLE1 (DATE_COL DATE, 
    BIG_INT_COL BIGINT, INT_COL INT, VARCHAR_COL VARCHAR(221), FLOAT_COL FLOAT);
    

    Import the data using Sqoop : Specify the created table name while importing the data, instead of using "--hive-create" option.

    sqoop import --connect jdbc:mysql://mysqlhost/mysqldb --username user --password 
    passwd --query "SELECT table1.date_col, table1.big_int_col, table1.int_col, 
    table1.varchar_col, table1.float_col FROM MYSQL_TABLE1 AS table1 WHERE 
    \$CONDITIONS" --split-by table1.date_col --hive-import 
    --hive-table hivedb.hive_table1 --target-dir hive_table1_data`
    

    Data was stored permanently in Hive.

    0 讨论(0)
  • 2021-02-06 10:58

    Nayan, you probably would have figured it out by now.

    Whether EXTERNAL or not, hive tables are stored on HDFS.

    The keyword EXTERNAL only loosely ties the table with its data. For example, deleting the EXTERNAL table from within Hive only deletes the schema and leaves the data untouched on HDFS.

    0 讨论(0)
  • 2021-02-06 11:00

    No need to create a table. Its not necessary. While we are importing itself we can do it. Please look the below command.

    sqoop import --connect jdbc:mysql://mysql_server:3306/db_name \
    --username mysql_user \
    --password mysql_pass \
    --table table_name \
    --hive-import
    
    0 讨论(0)
  • 2021-02-06 11:04

    Even if there is no table in hive, sqoop import will create it. The following worked for me :

    sqoop import --connect jdbc:mysql://localhost/<<dbname>> --username <<YourMySqlUsername>> --password <<YourMySqlpwd>> --table employee --hive-import --hive-table employee_1 -m -1
    
    0 讨论(0)
提交回复
热议问题