Loading Data from a .txt file to Table Stored as ORC in Hive

后端 未结 5 878
隐瞒了意图╮
隐瞒了意图╮ 2020-12-01 01:30

I have a data file which is in .txt format. I am using the file to load data into Hive tables. When I load the file in a table like

CREATE TABL         


        
相关标签:
5条回答
  • 2020-12-01 01:55

    Steps:

    1. First create a table using stored as TEXTFILE  (i.e default or in whichever format you want to create table)
    2. Load data into text table.
    3. Create table using stored as ORC as select * from text_table;
    4. Select * from orc table.

    Example:

    CREATE TABLE text_table(line STRING);
    
    LOAD DATA 'path_of_file' OVERWRITE INTO text_table;
    
    CREATE TABLE orc_table STORED AS ORC AS SELECT * FROM text_table;
    
    SELECT * FROM orc_table;   /*(it can now be read)*/
    
    0 讨论(0)
  • 2020-12-01 01:56

    LOAD DATA just copies the files to hive datafiles. Hive does not do any transformation while loading data into tables.

    So, in this case the input file /home/user/test_details.txt needs to be in ORC format if you are loading it into an ORC table.

    A possible workaround is to create a temporary table with STORED AS TEXT, then LOAD DATA into it, and then copy data from this table to the ORC table.

    Here is an example:

    CREATE TABLE test_details_txt( visit_id INT, store_id SMALLINT) STORED AS TEXTFILE;
    CREATE TABLE test_details_orc( visit_id INT, store_id SMALLINT) STORED AS ORC;
    
    -- Load into Text table
    LOAD DATA LOCAL INPATH '/home/user/test_details.txt' INTO TABLE test_details_txt;
    
    -- Copy to ORC table
    INSERT INTO TABLE test_details_orc SELECT * FROM test_details_txt;
    
    0 讨论(0)
  • 2020-12-01 02:03

    Since Hive does not do any transformation to our input data, the format needs to be the same: either the file should be in ORC format, or we can load data from a text file to a text table in Hive.

    0 讨论(0)
  • 2020-12-01 02:06

    Steps to load data into ORC file format in hive

    1.Create one normal table using textFile format

    2.Load the data normally into this table

    3.Create one table with the schema of the expected results of your normal hive table using stored as orcfile

    4.Insert overwrite query to copy the data from textFile table to orcfile table

    Refer the blog to learn the handson of how to load data into all file formats in hive

    Load data into all file formats in hive

    0 讨论(0)
  • 2020-12-01 02:16

    ORC file is a binary file format, so you can not directly load text files into ORC tables. ORC stands for Optimized Row Columnar which means it can store data in an optimized way than the other file formats. ORC reduces the size of the original data up to 75%. As a result the speed of data processing also increases. ORC shows better performance than Text, Sequence and RC file formats. An ORC file contains rows data in groups called as Stripes along with a file footer. ORC format improves the performance when Hive is processing the data.

    First you need to create one normal table as textFile, load your data into the textFile table and then you can use insert overwrite query to write your data into ORC file.

    create table table_name1 (schema of the table) row format delimited by ',' | stored as TEXTFILE
    
    create table table_name2 (schema of the table) row format delimited by ',' | stored as ORC
    
    load data local inpath ‘path of your file’ into table table_name1;(loading data from a local system)
    
    INSERT OVERWRITE TABLE table_name2 SELECT * FROM table_name1;
    

    Now all your data will be stored in an ORC file. The similar procedure is applied to all the binary file formats i.e., Sequence files, RC files and Parquet files in Hive.

    You can refer to the below link for more details.

    https://acadgild.com/blog/file-formats-in-apache-hive/

    0 讨论(0)
提交回复
热议问题