sqoop | 易学教程

Sqoop “import-all-tables” unable to import all tables

阅读更多关于 Sqoop “import-all-tables” unable to import all tables

问题 this is the sqoop command which I am using to import data from SQL Server to Hive sqoop-import-all-tables --connect "jdbc:sqlserver://ip.ip.ip.ip\MIGERATIONSERVER;port=1433;username=sa;password=blablaq;database=sqlserverdb" --create-hive-table --hive-import --hive-database hivemtdb The problem is that sqlserverdb has about 100 tables but when i issue this command it is just importing 6 or 7 random tables to hive. This behavior is really strange for me. I am unable to find where I am doing

How to give custom name to Sqoop output files

阅读更多关于 How to give custom name to Sqoop output files

问题 When I import data to hive using sqoop bydefault it creates file name as part-m-0000, part-m-0001 etc on HDFS. Is it possible to rename these files? If i wish to give some meaningfull name like suffxing file name with date to indicate load how can I do it? Please suggest 回答1: You can't do it with sqoop directly, but you can rename them in HDFS after sqoop is done importing: today=`date +%Y-%m-%d` files=$(hadoop fs -ls /path-to-files | awk '{print $8}') for f in $files; do hadoop fs -mv $f $f

What is the purpose of $CONDITIONS under --query?

阅读更多关于 What is the purpose of $CONDITIONS under --query?

问题 I am using cloudera quick start edition CDH 5.7 I used below query on terminal window: sqoop import \ --connect "jdbc:mysql://quickstart.cloudera:3306/retail_db" \ --username=retail_dba \ --password=cloudera \ --query="select * from orders join order_items on orders.order_id = order_items.order_item_order_id where \$CONDITIONS" \ --target-dir /user/cloudera/order_join \ --split-by order_id \ --num-mappers 4 Q: What is the purpose of the $CONDITIONS ? Why used in this query ? Can anybody can

How do I import RDBMS data to a specific hive database using sqoop import

阅读更多关于 How do I import RDBMS data to a specific hive database using sqoop import

问题 I need to import external database from mysql into hive using sqoop. My requirements are to import the complete database with all the tables into a specified hive database using sqoop import. for example, I want to import mysql database 'hadoop_practice' along with all its tables to hive database 'hadoop_practice'. However, when I perform the following command $ sqoop import-all-tables --connect jdbc:mysql://localhost/hadoop_practice --username root -P --hive-import the tables are imported

Significance of --connection-manager in Sqoop

阅读更多关于 Significance of --connection-manager in Sqoop

问题 I have written sqoop script to import data from Teradata to Hive. `sqoop import \ --connect $JDBC_URL \ --driver com.teradata.jdbc.TeraDriver \ --username $Username \ --password $Password \ --table $TD_Table \ --hive-import \ --hive-overwrite \ --hive-drop-import-delims \ --hive-table $Hive_Database.$Hive_Staging_Table \ --split-by $Split_Col \ -m $Mapper_Number` Above script gives warning as --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager.

Imported Failed: Cannot convert SQL type 2005==> during importing CLOB data from Oracle database

阅读更多关于 Imported Failed: Cannot convert SQL type 2005==> during importing CLOB data from Oracle database

问题 I am trying to import a Oracle table's data with CLOB data type using sqoop and it is failing with the error Imported Failed: Cannot convert SQL type 2005 . I am using Running Sqoop version: 1.4.5-cdh5.4.7 . Please help me how to import CLOB data type. I am using the below oozie workflow to import the data <workflow-app xmlns="uri:oozie:workflow:0.4" name="EBIH_Dly_tldb_dly_load_wf"> <credentials> <credential name="hive2_cred" type="hive2"> <property> <name>hive2.jdbc.url</name> <value>$

Imported Failed: Duplicate Column identifier specified (sqoop)

阅读更多关于 Imported Failed: Duplicate Column identifier specified (sqoop)

问题 Query: sqoop import --connect jdbc:mysql://localhost/userdb --username abc --password abc --query 'SELECT e.*,d.* FROM employee e JOIN department d on e.DEPTNO = d.DEPTNO WHERE $CONDITIONS ' --split-by e.DEPTNO --target-dir /output/result; Error: Imported Failed: Duplicate Column identifier specified (sqoop) 回答1: It is expected behaviour as you are selecting all the columns in your query and both the tables has same column DEPTNO . select all the columns individually with alias name. Modify

Handling bad records during sqoop import or export

阅读更多关于 Handling bad records during sqoop import or export

问题 I looked at the options provided by sqoop export operation but could not find any options to handle bad records. For example once in a while it is possible that a character is present where a number is expected in a huge set of records. Is there a way to handle these scenarios in sqoop without failing the job and providing the bad records in a file. 回答1: Sqoop currently expects that the data to export is clean and do not provide facilities to handle corrupted data. You can use MR/Pig/Hive job

Why spark is slower when compared to sqoop , when it comes to jdbc?

阅读更多关于 Why spark is slower when compared to sqoop , when it comes to jdbc?

问题 It is understood , while migrating/load from oracle db to hdfs/parquet , it is preferred to use SQOOP rather than SPARK with JDBC driver. Spark suppose to be 100x faster when processing right ? Then what is wrong with Spark ? Why people prefer SQOOP while loading data from oracle db tables ? Please suggest me what should i need to do make Spark faster when loading data from oracle. 回答1: Spark is fast when it knows how to parallelize queries. If you're just executing single query, then Spark

走近大数据之Hive进阶（一、Hive数据的导入）

阅读更多关于走近大数据之Hive进阶（一、Hive数据的导入）

一、使用Load语句进行数据的导入 -语法： LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcoll = vall, partcol2 = val2 ...)] *将student01.txt数据导入t2 （t2表没有指定分隔符） load data local inpath '/root/data/student01.txt' into table t2; select * from t2--查询检查（需要在建表的时候确定数据分隔符是否与导入数据来源相同，不同的话，数据全为NULL） *将/root/data下的所有数据文件导入t3表中，并且覆盖原来的数据（t3表分隔符是逗号） load data local inpath '/root/data/' overwrite into table t3; *将HDFS中，/input/student01.txt 导入到t3；(HDFS中的文件导入，不需要添加‘local'） load data inpath '/input/student01.txt/ overwrite into table t3; *将数据导入分区表 load data local inpath '/root/data/data1.txt'