hive-serde

Loading JSON file in HIVE table

故事扮演 提交于 2019-12-12 04:23:12
问题 I have a JSON file like below, which I want to load in a HIVE table with parsed format, what are possible options I can go for. If it is AVRO then I could have used directly AvroSerDe. But the source file in this case is JSON. { "subscriberId":"vfd1234-07e1-4054-9b64-83a5a20744db", "cartId":"1234edswe-6a9c-493c-bcd0-7fb71995beef", "cartStatus":"default", "salesChannel":"XYZ", "accountId":"12345", "channelNumber":"12", "timestamp":"Dec 12, 2013 8:30:00 AM", "promotions":[ { "promotionId":

Complex XML schema to Hive schema

巧了我就是萌 提交于 2019-12-11 17:23:53
问题 I am trying to load a xml file into hive table. I am using xml serde [here][1]. I am able to load simple flat xml files. But when there are nested elements in the xml, I am using hive complex data types to store them (for e.g., array<struct> ). Below is the sample xml that I am trying to load. My goal is to load all elements, attributes and content into hive table. <classif action="del"> <code>123</code> <class action="aou"> <party>p1</party> <description action="up"> <name action="aorup" ln=

Create hive table from JSON data

半世苍凉 提交于 2019-12-11 07:39:01
问题 I have a file with Json data which takes the below form: Ex: { "Name": "xxxx", "Address": [{ "Street": "aa", "City": "bbb" }, { "Street": "ccc", "City": "ddd", "Country": "eee" }] } The above Json is a valid Json. I want to create a hive table on top of data of above form using JsonSerde. 回答1: Create table with all possible fields defined. If field is not present in json, select will return NULL: CREATE EXTERNAL TABLE your_table ( Name string, Address array<struct<Street:string,City:string

How do I import an array of data into separate rows in a hive table?

江枫思渺然 提交于 2019-12-09 19:29:42
问题 I am trying to import data in the following format into a hive table [ { "identifier" : "id#1", "dataA" : "dataA#1" }, { "identifier" : "id#2", "dataA" : "dataA#2" } ] I have multiple files like this and I want each {} to form one row in the table. This is what I have tried: CREATE EXTERNAL TABLE final_table( identifier STRING, dataA STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION "s3://bucket/path_in_bucket/" This is not creating a single row for each {} though. I

csv file to hive table using load data - How to format the date in csv to accept by hive table

删除回忆录丶 提交于 2019-12-06 12:20:48
问题 I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select. 1750,651,'2013-03-11','2013-03-17' 1751,652,'2013-03-18','2013-03-24' 1752,653,'2013-03-25','2013-03-31' 1753,654,'2013-04-01','2013-04-07' create table dattable( DATANUM INT, ENTRYNUM BIGINT, START_DATE DATE, END_DATE DATE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ; LOAD DATA LOCAL INPATH '/path/dtatable.csv'

SerDe properties list for AWS Athena (JSON)

点点圈 提交于 2019-12-06 03:09:45
I'm testing the Athena product of AWS, so far is working very good. But I want to know the list of SerDe properties. I've searched far and wide and couldn't find it. I'm using this one for example "ignore.malformed.json" = "true" , but I'm pretty sure there are a ton of other options to tune the queries. I couldn't find info for example, on what the "path" property does, so having the full list will be amazing. I have looked at Apache Hive docs but couldn't find this, and neither on AWS docs/forums. Thanks! It seems you are using the Openx-JsonSerDe http://docs.aws.amazon.com/athena/latest/ug

How do I import an array of data into separate rows in a hive table?

纵饮孤独 提交于 2019-12-04 17:31:53
I am trying to import data in the following format into a hive table [ { "identifier" : "id#1", "dataA" : "dataA#1" }, { "identifier" : "id#2", "dataA" : "dataA#2" } ] I have multiple files like this and I want each {} to form one row in the table. This is what I have tried: CREATE EXTERNAL TABLE final_table( identifier STRING, dataA STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION "s3://bucket/path_in_bucket/" This is not creating a single row for each {} though. I have also tried CREATE EXTERNAL TABLE final_table( rows ARRAY< STRUCT< identifier: STRING, dataA: STRING >

csv file to hive table using load data - How to format the date in csv to accept by hive table

孤者浪人 提交于 2019-12-04 17:17:15
I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select. 1750,651,'2013-03-11','2013-03-17' 1751,652,'2013-03-18','2013-03-24' 1752,653,'2013-03-25','2013-03-31' 1753,654,'2013-04-01','2013-04-07' create table dattable( DATANUM INT, ENTRYNUM BIGINT, START_DATE DATE, END_DATE DATE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ; LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable ; Select returns NULL values for the last 2 cols Other question was what if

Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

我只是一个虾纸丫 提交于 2019-12-04 05:04:53
Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table , you get this: STORED AS INPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’ But if you create the table with those clauses, you will then get the casting error when selecting. Error likes: Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable To fix this, just

Why does all columns get created as string when I use OpenCSVSerde in Hive?

馋奶兔 提交于 2019-12-03 16:07:49
I am trying to create a table using the OpenCSVSerde and some integer and date columns. But the columns get converted to String. Is this an expected outcome? As a workaround, I do an explicit type-cast after this step (which makes the complete run slower) hive> create external table if not exists response(response_id int,lead_id int,creat_date date ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ('quoteChar' = '"', 'separatorChar' = '\,', 'serialization.encoding'='UTF-8', 'escapeChar' = '~') location '/prod/hive/db/response' TBLPROPERTIES ("serialization