Hive table from CSV. The line termination in quotes

后端 未结 2 1998
梦毁少年i
梦毁少年i 2021-01-13 14:54

I try to create table from CSV file which is save into HDFS. The problem is that the csv consist line break inside of quote. Example of record in CSV:



        
2条回答
  •  暖寄归人
    2021-01-13 15:35

    I found the solution. You can define your own InputFormatter. Then the DDL for HQL table will looks like this (At first you need to add your custom jar file):

    ADD JAR /path/to/your/jar/CSVCustomInputFormat.jar;
    DROP TABLE hive_database.hive_table;
    CREATE EXTERNAL TABLE  hive_database.hive_table
    (   
        ID STRING,
        PR_ID STRING,
        SUMMARY STRING 
    )
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
       "separatorChar" = ",",
       "quoteChar"     = "\"",
       "escapeChar"    = "\\"
    ) 
    STORED AS 
    INPUTFORMAT 'com.hql.custom.formatter.CSVCustomInputFormatt' 
    OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat' 
    LOCATION '/path/to/hdfs/dir/csv'
    tblproperties('skip.header.line.count'='1');
    

    Then how to create the custom input formatter you can see for example here: https://analyticsanvil.wordpress.com/2016/03/06/creating-a-custom-hive-input-format-and-record-reader-to-read-fixed-format-flat-files/

提交回复
热议问题