1.下载包并配置
下载hadoop-0.20.2-CDH3B4.tar.gz、sqoop-1.2.0-CDH3B4.tar.gz, 并解压,将Hadoop-0.20.2-CDH3B4 目录下hadoop-core-0.20.2-CDH3B4.jar 复制到Sqoop-1.2.0-CDH3B4的lib目录下。修改Sqoop-1.2.0-CDH3B4/bin/ configure-sqoop, 注释掉对ZOOKEEPER_HOME的检查
2. 配置环境变量
export SQOOP_HOME=/home/admin/sqoop-1.2.0-CDH3B4
export PATH=$PATH:$SQOOP_HOME/bin
3. 测试安装
[admin@server1 ~]$ sqoop help
usage: sqoop COMMAND [ARGS]
Available commands:
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
See 'sqoop help COMMAND' for information on a specific command.
4. 从MySQL导入HDFS
使用原先做Hive实验的数据库实例hive,显示表SEQUENCE_TABLE的数据
mysql> select * from SEQUENCE_TABLE;
+-----------------------------------------------------------+----------+
| SEQUENCE_NAME | NEXT_VAL |
+-----------------------------------------------------------+----------+
| org.apache.hadoop.hive.metastore.model.MColumnDescriptor | 16 |
| org.apache.hadoop.hive.metastore.model.MDatabase | 6 |
| org.apache.hadoop.hive.metastore.model.MSerDeInfo | 16 |
| org.apache.hadoop.hive.metastore.model.MStorageDescriptor | 16 |
| org.apache.hadoop.hive.metastore.model.MTable | 16 |
+-----------------------------------------------------------+----------+
5 rows in set (0.00 sec)
将mysql-connector-java-5.1.18-bin.jar 复制到Sqoop-1.2.0-CDH3B4的lib目录下,使用sqoop将表SEQUENCE_TABLE的数据导入HDFS
[admin@server1 bin]$ sqoop import --connect jdbc:mysql://server1/hive --username hive --password hive --table SEQUENCE_TABLE -m 3;
12/12/16 01:27:16 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
12/12/16 01:27:16 INFO tool.CodeGenTool: Beginning code generation
12/12/16 01:27:16 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `SEQUENCE_TABLE` AS t LIMIT 1
12/12/16 01:27:16 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `SEQUENCE_TABLE` AS t LIMIT 1
12/12/16 01:27:16 INFO orm.CompilationManager: HADOOP_HOME is /home/admin/hadoop-0.20.2/bin/..
12/12/16 01:27:16 INFO orm.CompilationManager: Found hadoop core jar at: /home/admin/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
12/12/16 01:27:18 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-admin/compile/415f2a5412b5c2aadd76474859647419/SEQUENCE_TABLE.jar
12/12/16 01:27:18 WARN manager.MySQLManager: It looks like you are importing from mysql.
12/12/16 01:27:18 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
12/12/16 01:27:18 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
12/12/16 01:27:18 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
12/12/16 01:27:18 INFO mapreduce.ImportJobBase: Beginning import of SEQUENCE_TABLE
12/12/16 01:27:19 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `SEQUENCE_TABLE` AS t LIMIT 1
12/12/16 01:27:20 WARN db.TextSplitter: Generating splits for a textual index column.
12/12/16 01:27:20 WARN db.TextSplitter: If your database sorts in a case-insensitive order, this may result in a partial import or duplicate records.
12/12/16 01:27:20 WARN db.TextSplitter: You are strongly encouraged to choose an integral split column.
12/12/16 01:27:20 INFO mapred.JobClient: Running job: job_201212152320_0004
12/12/16 01:27:21 INFO mapred.JobClient: map 0% reduce 0%
12/12/16 01:27:41 INFO mapred.JobClient: map 25% reduce 0%
12/12/16 01:27:51 INFO mapred.JobClient: map 50% reduce 0%
12/12/16 01:27:57 INFO mapred.JobClient: map 100% reduce 0%
12/12/16 01:27:59 INFO mapred.JobClient: Job complete: job_201212152320_0004
12/12/16 01:27:59 INFO mapred.JobClient: Counters: 5
12/12/16 01:27:59 INFO mapred.JobClient: Job Counters
12/12/16 01:27:59 INFO mapred.JobClient: Launched map tasks=4
12/12/16 01:27:59 INFO mapred.JobClient: FileSystemCounters
12/12/16 01:27:59 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=274
12/12/16 01:27:59 INFO mapred.JobClient: Map-Reduce Framework
12/12/16 01:27:59 INFO mapred.JobClient: Map input records=5
12/12/16 01:27:59 INFO mapred.JobClient: Spilled Records=0
12/12/16 01:27:59 INFO mapred.JobClient: Map output records=5
12/12/16 01:27:59 INFO mapreduce.ImportJobBase: Transferred 274 bytes in 40.4297 seconds (6.7772 bytes/sec)
12/12/16 01:27:59 INFO mapreduce.ImportJobBase: Retrieved 5 records.
查看HDFS中的数据
[admin@server1 bin]$ hadoop dfs -cat SEQUENCE_TABLE/part*
org.apache.hadoop.hive.metastore.model.MColumnDescriptor,16
org.apache.hadoop.hive.metastore.model.MDatabase,6
org.apache.hadoop.hive.metastore.model.MSerDeInfo,16
org.apache.hadoop.hive.metastore.model.MStorageDescriptor,16
org.apache.hadoop.hive.metastore.model.MTable,16
5.从HDFS导出到MySQL
HDFS中有文件test.txt, 内容如下:
[admin@server1 ~]$ hadoop dfs -cat /test.txt
aaaa,111
bbbb,222
cccc,333
dddd,444
MySQL中建立表test
mysql> CREATE TABLE test(str VARCHAR(10), num INT);
Query OK, 0 rows affected (0.01 sec)
执行导出
[admin@server1 ~]$ sqoop export --connect jdbc:mysql://server1/hive --username hive --password hive --table test --export-dir /test.txt -m 3;
12/12/16 01:51:50 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
12/12/16 01:51:50 INFO tool.CodeGenTool: Beginning code generation
12/12/16 01:51:50 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
12/12/16 01:51:50 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
12/12/16 01:51:50 INFO orm.CompilationManager: HADOOP_HOME is /home/admin/hadoop-0.20.2/bin/..
12/12/16 01:51:50 INFO orm.CompilationManager: Found hadoop core jar at: /home/admin/hadoop-0.20.2/bin/../hadoop-0.20.2-core.jar
12/12/16 01:51:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-admin/compile/bebfa051f0e18c14ad2c466547b23c92/test.jar
12/12/16 01:51:52 INFO mapreduce.ExportJobBase: Beginning export of test
12/12/16 01:51:52 INFO manager.MySQLManager: Executing SQL statement: SELECT t.* FROM `test` AS t LIMIT 1
12/12/16 01:51:53 INFO input.FileInputFormat: Total input paths to process : 1
12/12/16 01:51:53 INFO input.FileInputFormat: Total input paths to process : 1
12/12/16 01:51:53 INFO mapred.JobClient: Running job: job_201212152320_0005
12/12/16 01:51:54 INFO mapred.JobClient: map 0% reduce 0%
12/12/16 01:52:01 INFO mapred.JobClient: map 100% reduce 0%
12/12/16 01:52:03 INFO mapred.JobClient: Job complete: job_201212152320_0005
12/12/16 01:52:03 INFO mapred.JobClient: Counters: 6
12/12/16 01:52:03 INFO mapred.JobClient: Job Counters
12/12/16 01:52:03 INFO mapred.JobClient: Rack-local map tasks=1
12/12/16 01:52:03 INFO mapred.JobClient: Launched map tasks=1
12/12/16 01:52:03 INFO mapred.JobClient: FileSystemCounters
12/12/16 01:52:03 INFO mapred.JobClient: HDFS_BYTES_READ=42
12/12/16 01:52:03 INFO mapred.JobClient: Map-Reduce Framework
12/12/16 01:52:03 INFO mapred.JobClient: Map input records=4
12/12/16 01:52:03 INFO mapred.JobClient: Spilled Records=0
12/12/16 01:52:03 INFO mapred.JobClient: Map output records=4
12/12/16 01:52:03 INFO mapreduce.ExportJobBase: Transferred 42 bytes in 10.7834 seconds (3.8949 bytes/sec)
12/12/16 01:52:03 INFO mapreduce.ExportJobBase: Exported 4 records.
查看MySql中的表test记录
mysql> select * from test;
+------+------+
| str | num |
+------+------+
| aaaa | 111 |
| bbbb | 222 |
| cccc | 333 |
| dddd | 444 |
+------+------+
4 rows in set (0.00 sec)
来源:https://www.cnblogs.com/leeeee/p/7276653.html