csv file to hive table using load data - How to format the date in csv to accept by hive table

删除回忆录丶 提交于 2019-12-06 12:20:48

问题


I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select.

1750,651,'2013-03-11','2013-03-17'
1751,652,'2013-03-18','2013-03-24'
1752,653,'2013-03-25','2013-03-31'
1753,654,'2013-04-01','2013-04-07'

create table dattable(
DATANUM    INT,  
ENTRYNUM BIGINT, 
START_DATE  DATE,
END_DATE    DATE ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ;

 LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable ;

Select returns NULL values for the last 2 cols

Other question was what if the date format is different than YYYY-MM-DD. is it possible to make hive identify the format? (Because right now i am modifying the csv file format to accept by hive)


回答1:


LasySimpleSerDe (default) does not work with quoted CSV. Use CSVSerDe:

create table dattable(
DATANUM    INT,  
ENTRYNUM BIGINT, 
START_DATE  DATE,
END_DATE    DATE ) 
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
   "separatorChar" = ",",
   "quoteChar"     = "'"
)  
STORED AS TEXTFILE;

Also read this: CSVSerDe treats all columns to be of type String

Define you date columns as string and apply conversion in select.




回答2:


Answer to your 2nd question:

You will need an additional temporary table to read your input file, and then you can do date conversions in your insert select statements.In your temporary table store date fields as string. Ex.

create table dattable_ext(
DATANUM    INT,  
ENTRYNUM BIGINT, 
START_DATE  String,
END_DATE    String) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';

Load data into temporary table

LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable_ext;

Insert from temporary table to the managed table.

insert into table dattable select DATANUM, ENTRYNUM,
from_unixtime(unix_timestamp(START_DATE,'yyyy/MM/dd'),'yyyy-MM-dd'),
from_unixtime(unix_timestamp(END_DATE,'yyyy/MM/dd'),'yyyy-MM-dd') from dattable_ext;

You can replace date format in unix_timestamp function with your input date format.



来源:https://stackoverflow.com/questions/54460151/csv-file-to-hive-table-using-load-data-how-to-format-the-date-in-csv-to-accept

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!