Hadoop/Hive : Loading data from .csv on a local machine

后端 未结 6 1352
不知归路
不知归路 2020-12-24 05:20

As this is coming from a newbie...

I had Hadoop and Hive set up for me, so I can run Hive queries on my computer accessing data on AWS cluster. Can I run Hive querie

相关标签:
6条回答
  • 2020-12-24 05:53

    For csv file formate data will be in below format

    "column1", "column2","column3","column4"
    

    And if we will use field terminated by ',' then each column will get values like below.

    "column1"    "column2"     "column3"     "column4"
    

    also if any of the column value has comma as value then it will not work at all .

    So the correct way to create a table would be by using OpenCSVSerde

    create table tableName (column1 datatype, column2 datatype , column3 datatype , column4 datatype)
    ROW FORMAT SERDE 
    'org.apache.hadoop.hive.serde2.OpenCSVSerde' 
    STORED AS TEXTFILE ;
    
    0 讨论(0)
  • 2020-12-24 06:07

    You can load local CSV file to Hive only if:

    1. You are doing it from one of the Hive cluster nodes.
    2. You installed Hive client on non-cluster node and using hive or beeline for upload.
    0 讨论(0)
  • 2020-12-24 06:09

    if you have a hive setup you can put the local dataset directly using Hive load command in hdfs/s3.

    You will need to use "Local" keyword when writing your load command.

    Syntax for hiveload command

    LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]
    

    Refer below link for more detailed information. https://cwiki.apache.org/confluence/display/Hive/LanguageManual%20DML#LanguageManualDML-Loadingfilesintotables

    0 讨论(0)
  • 2020-12-24 06:09

    There is another way of enabling this,

    1. use hadoop hdfs -copyFromLocal to copy the .csv data file from your local computer to somewhere in HDFS, say '/path/filename'

    2. enter Hive console, run the following script to load from the file to make it as a Hive table. Note that '\054' is the ascii code of 'comma' in octal number, representing fields delimiter.


    CREATE EXTERNAL TABLE table name (foo INT, bar STRING)
     COMMENT 'from csv file'
     ROW FORMAT DELIMITED FIELDS TERMINATED BY '\054'
     STORED AS TEXTFILE
     LOCATION '/path/filename';
    
    0 讨论(0)
  • 2020-12-24 06:11

    Let me work you through the following simple steps:

    Steps:

    First, create a table on hive using the field names in your csv file. Lets say for example, your csv file contains three fields (id, name, salary) and you want to create a table in hive called "staff". Use the below code to create the table in hive.

    hive> CREATE TABLE Staff (id int, name string, salary double) row format delimited fields terminated by ',';
    

    Second, now that your table is created in hive, let us load the data in your csv file to the "staff" table on hive.

    hive>  LOAD DATA LOCAL INPATH '/home/yourcsvfile.csv' OVERWRITE INTO TABLE Staff;
    

    Lastly, display the contents of your "Staff" table on hive to check if the data were successfully loaded

    hive> SELECT * FROM Staff;
    

    Thanks.

    0 讨论(0)
  • 2020-12-24 06:11

    You may try this, Following are few examples on how files are generated. Tool -- https://sourceforge.net/projects/csvtohive/?source=directory

    1. Select a CSV file using Browse and set hadoop root directory ex: /user/bigdataproject/

    2. Tool Generates Hadoop script with all csv files and following is a sample of generated Hadoop script to insert csv into Hadoop

      #!/bin/bash -v
      hadoop fs -put ./AllstarFull.csv /user/bigdataproject/AllstarFull.csv hive -f ./AllstarFull.hive

      hadoop fs -put ./Appearances.csv /user/bigdataproject/Appearances.csv hive -f ./Appearances.hive

      hadoop fs -put ./AwardsManagers.csv /user/bigdataproject/AwardsManagers.csv hive -f ./AwardsManagers.hive

    3. Sample of generated Hive scripts

      CREATE DATABASE IF NOT EXISTS lahman;
      USE lahman;
      CREATE TABLE AllstarFull (playerID string,yearID string,gameNum string,gameID string,teamID string,lgID string,GP string,startingPos string) row format delimited fields terminated by ',' stored as textfile;
      LOAD DATA INPATH '/user/bigdataproject/AllstarFull.csv' OVERWRITE INTO TABLE AllstarFull;
      SELECT * FROM AllstarFull;

    Thanks Vijay

    0 讨论(0)
提交回复
热议问题