Hive load CSV with commas in quoted fields

前端 未结 6 802
长情又很酷
长情又很酷 2020-12-23 02:44

I am trying to load a CSV file into a Hive table like so:

CREATE TABLE mytable
(
num1 INT,
text1 STRING,
num2 INT,
text2 STRING
)
ROW FORMAT DELIMITED FIELDS         


        
相关标签:
6条回答
  • 2020-12-23 02:50

    Add a backward slash in FIELDS TERMINATED BY '\;'

    For Example:

    CREATE  TABLE demo_table_1_csv
    COMMENT 'my_csv_table 1'
    ROW FORMAT DELIMITED
    FIELDS TERMINATED BY '\;'
    LINES TERMINATED BY '\n'
    STORED AS TEXTFILE
    LOCATION 'your_hdfs_path'
    AS 
    select a.tran_uuid,a.cust_id,a.risk_flag,a.lookback_start_date,a.lookback_end_date,b.scn_name,b.alerted_risk_category,
    CASE WHEN (b.activity_id is not null ) THEN 1 ELSE 0 END as Alert_Flag 
    FROM scn1_rcc1_agg as a LEFT OUTER JOIN scenario_activity_alert as b ON a.tran_uuid = b.activity_id;
    

    I have tested it, and it worked.

    0 讨论(0)
  • 2020-12-23 02:51

    keep the delimiter in single quotes it will work.

    ROW FORMAT DELIMITED 
    FIELDS TERMINATED BY ','
    LINES TERMINATED BY '\n';
    

    This will work

    0 讨论(0)
  • 2020-12-23 02:52

    The problem is that Hive doesn't handle quoted texts. You either need to pre-process the data by changing the delimiter between the fields (e.g: with a Hadoop-streaming job) or you can also give a try to use a custom CSV SerDe which uses OpenCSV to parse the files.

    0 讨论(0)
  • 2020-12-23 03:04

    If you can re-create or parse your input data, you can specify an escape character for the CREATE TABLE:

    ROW FORMAT DELIMITED FIELDS TERMINATED BY "," ESCAPED BY '\\';
    

    Will accept this line as 4 fields

    1,some text\, with comma in it,123,more text
    
    0 讨论(0)
  • 2020-12-23 03:07

    As of Hive 0.14, the CSV SerDe is a standard part of the Hive install

    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'

    (See: https://cwiki.apache.org/confluence/display/Hive/CSV+Serde)

    0 讨论(0)
  • 2020-12-23 03:10

    ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE Serde worked for me. My delimiter was '|' and one of the columns is enclosed in double quotes.

    Query:

    CREATE EXTERNAL TABLE EMAIL(MESSAGE_ID STRING, TEXT STRING, TO_ADDRS STRING, FROM_ADDRS STRING, SUBJECT STRING, DATE STRING)
    ROW FORMAT SERDE 'ORG.APACHE.HADOOP.HIVE.SERDE2.OPENCSVSERDE'
    WITH SERDEPROPERTIES (
         "SEPARATORCHAR" = "|",
         "QUOTECHAR"     = "\"",
         "ESCAPECHAR"    = "\""
    )    
    STORED AS TEXTFILE location '/user/abc/csv_folder';
    
    0 讨论(0)
提交回复
热议问题