Hadoop/Hive : Loading data from .csv on a local machine

后端未结

关注

 6  1352

As this is coming from a newbie...

I had Hadoop and Hive set up for me, so I can run Hive queries on my computer accessing data on AWS cluster. Can I run Hive querie

相关标签:

6条回答

醉话见心

2020-12-24 05:53
For csv file formate data will be in below format
```
"column1", "column2","column3","column4"
```
And if we will use field terminated by ',' then each column will get values like below.
```
"column1"    "column2"     "column3"     "column4"
```
also if any of the column value has comma as value then it will not work at all .

So the correct way to create a table would be by using OpenCSVSerde
```
create table tableName (column1 datatype, column2 datatype , column3 datatype , column4 datatype)
ROW FORMAT SERDE 
'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
STORED AS TEXTFILE ;
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
囚心锁ツ

2020-12-24 06:07
You can load local CSV file to Hive only if:
1. You are doing it from one of the Hive cluster nodes.
2. You installed Hive client on non-cluster node and using hive or beeline for upload.
0 讨论(0)
发布评论:

提交评论
- 加载中...
傲寒

2020-12-24 06:09
if you have a hive setup you can put the local dataset directly using Hive load command in hdfs/s3.

You will need to use "Local" keyword when writing your load command.

Syntax for hiveload command
```
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
```
Refer below link for more detailed information. https://cwiki.apache.org/confluence/display/Hive/LanguageManual%20DML#LanguageManualDML-Loadingfilesintotables
0 讨论(0)
发布评论:

提交评论
- 加载中...
暖寄归人

2020-12-24 06:09
There is another way of enabling this,
1. use hadoop hdfs -copyFromLocal to copy the .csv data file from your local computer to somewhere in HDFS, say '/path/filename'
2. enter Hive console, run the following script to load from the file to make it as a Hive table. Note that '\054' is the ascii code of 'comma' in octal number, representing fields delimiter.
```
CREATE EXTERNAL TABLE table name (foo INT, bar STRING)
 COMMENT 'from csv file'
 ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
 STORED AS TEXTFILE
 LOCATION '/path/filename';
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
自闭症患者

2020-12-24 06:11
Let me work you through the following simple steps:

Steps:

First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive.
```
hive> CREATE TABLE Staff (id int, name string, salary double) row format delimited fields terminated by ',';
```
Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive.
```
hive>  LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' OVERWRITE INTO TABLE Staff;
```
Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded
```
hive> SELECT * FROM Staff;
```
Thanks.
0 讨论(0)
发布评论:

提交评论
- 加载中...

有刺的猬

2020-12-24 06:11

You may try this, Following are few examples on how files are generated. Tool -- https://sourceforge.net/projects/csvtohive/?source=directory

Select a CSV file using Browse and set hadoop root directory ex: /user/bigdataproject/

Tool Generates Hadoop script with all csv files and following is a sample of generated Hadoop script to insert csv into Hadoop

#!/bin/bash -v

hadoop fs -put ./AllstarFull.csv /user/bigdataproject/AllstarFull.csv
hive -f ./AllstarFull.hive


hadoop fs -put ./Appearances.csv /user/bigdataproject/Appearances.csv
hive -f ./Appearances.hive


hadoop fs -put ./AwardsManagers.csv /user/bigdataproject/AwardsManagers.csv
hive -f ./AwardsManagers.hive

Sample of generated Hive scripts

CREATE DATABASE IF NOT EXISTS lahman;

USE lahman;

CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;

LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;

SELECT * FROM AllstarFull;

Thanks Vijay

0 讨论(0)