Difference between `load data inpath ` and `location` in hive?

前端未结

关注

 2  1360

At my firm, I see these two commands used frequently, and I\'d like to be aware of the differences, because their functionality seems the same to me:

1

相关标签:

2条回答

自闭症患者

2021-01-04 14:38
Yes, they are used for different purpose at all.

load data inpath command is use to load data into hive table. 'LOCAL' signifies that the input file is on the local file system. If 'LOCAL' is omitted then it looks for the file in HDFS.
```
load data inpath '/directory-path/file.csv' into <mytable>; 
load data local inpath '/local-directory-path/file.csv' into <mytable>;
```
LOCATION keyword allow to points to any HDFS location for its storage, rather than being stored in a folder specified by the configuration property hive.metastore.warehouse.dir.

In other words, with specified LOCATION '/your-path/', Hive does not use a default location for this table. This comes in handy if you already have data generated.

Remember, LOCATION can be specify on EXTERNAL tables only. For regular tables, default location will be used.

To summarize, load data inpath tell hive where to look for input files and LOCATION keyword tells hive where to save output files on HDFS.

References: https://cwiki.apache.org/confluence/display/Hive/GettingStarted https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL
0 讨论(0)
发布评论:

提交评论
- 加载中...
萌比男神i

2021-01-04 14:45
Option 1: Internal table
```
create table <mytable> 
(name string,
number double);

load data inpath '/directory-path/file.csv' into <mytable>; 
```
This command will remove content at source directory and create a internal table

Option 2: External table
```
 create table <mytable>
 (name string,
 number double);

location '/directory-path/file.csv';
```
Create external table and copy the data into table. Now data won't be moved from source. You can drop external table but still source data is available.

When you drop an external table, it only drops the meta data of HIVE table. Data still exists at HDFS file location.

Have a look at this related SE questions regarding use cases for both internal and external tables

Difference between Hive internal tables and external tables?
0 讨论(0)
发布评论:

提交评论
- 加载中...