hive-serde | 易学教程

Loading JSON file in HIVE table

阅读更多关于 Loading JSON file in HIVE table

问题 I have a JSON file like below, which I want to load in a HIVE table with parsed format, what are possible options I can go for. If it is AVRO then I could have used directly AvroSerDe. But the source file in this case is JSON. { "subscriberId":"vfd1234-07e1-4054-9b64-83a5a20744db", "cartId":"1234edswe-6a9c-493c-bcd0-7fb71995beef", "cartStatus":"default", "salesChannel":"XYZ", "accountId":"12345", "channelNumber":"12", "timestamp":"Dec 12, 2013 8:30:00 AM", "promotions":[ { "promotionId":

Complex XML schema to Hive schema

阅读更多关于 Complex XML schema to Hive schema

问题 I am trying to load a xml file into hive table. I am using xml serde [here][1]. I am able to load simple flat xml files. But when there are nested elements in the xml, I am using hive complex data types to store them (for e.g., array<struct> ). Below is the sample xml that I am trying to load. My goal is to load all elements, attributes and content into hive table. <classif action="del"> <code>123</code> <class action="aou"> <party>p1</party> <description action="up"> <name action="aorup" ln=

Create hive table from JSON data

阅读更多关于 Create hive table from JSON data

问题 I have a file with Json data which takes the below form: Ex: { "Name": "xxxx", "Address": [{ "Street": "aa", "City": "bbb" }, { "Street": "ccc", "City": "ddd", "Country": "eee" }] } The above Json is a valid Json. I want to create a hive table on top of data of above form using JsonSerde. 回答1: Create table with all possible fields defined. If field is not present in json, select will return NULL: CREATE EXTERNAL TABLE your_table ( Name string, Address array<struct<Street:string,City:string

How do I import an array of data into separate rows in a hive table?

阅读更多关于 How do I import an array of data into separate rows in a hive table?

问题 I am trying to import data in the following format into a hive table [ { "identifier" : "id#1", "dataA" : "dataA#1" }, { "identifier" : "id#2", "dataA" : "dataA#2" } ] I have multiple files like this and I want each {} to form one row in the table. This is what I have tried: CREATE EXTERNAL TABLE final_table( identifier STRING, dataA STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION "s3://bucket/path_in_bucket/" This is not creating a single row for each {} though. I

csv file to hive table using load data - How to format the date in csv to accept by hive table

阅读更多关于 csv file to hive table using load data - How to format the date in csv to accept by hive table

问题 I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select. 1750,651,'2013-03-11','2013-03-17' 1751,652,'2013-03-18','2013-03-24' 1752,653,'2013-03-25','2013-03-31' 1753,654,'2013-04-01','2013-04-07' create table dattable( DATANUM INT, ENTRYNUM BIGINT, START_DATE DATE, END_DATE DATE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ; LOAD DATA LOCAL INPATH '/path/dtatable.csv'

SerDe properties list for AWS Athena (JSON)

阅读更多关于 SerDe properties list for AWS Athena (JSON)

I'm testing the Athena product of AWS, so far is working very good. But I want to know the list of SerDe properties. I've searched far and wide and couldn't find it. I'm using this one for example "ignore.malformed.json" = "true" , but I'm pretty sure there are a ton of other options to tune the queries. I couldn't find info for example, on what the "path" property does, so having the full list will be amazing. I have looked at Apache Hive docs but couldn't find this, and neither on AWS docs/forums. Thanks! It seems you are using the Openx-JsonSerDe http://docs.aws.amazon.com/athena/latest/ug

How do I import an array of data into separate rows in a hive table?

阅读更多关于 How do I import an array of data into separate rows in a hive table?

I am trying to import data in the following format into a hive table [ { "identifier" : "id#1", "dataA" : "dataA#1" }, { "identifier" : "id#2", "dataA" : "dataA#2" } ] I have multiple files like this and I want each {} to form one row in the table. This is what I have tried: CREATE EXTERNAL TABLE final_table( identifier STRING, dataA STRING ) ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe' LOCATION "s3://bucket/path_in_bucket/" This is not creating a single row for each {} though. I have also tried CREATE EXTERNAL TABLE final_table( rows ARRAY< STRUCT< identifier: STRING, dataA: STRING >

csv file to hive table using load data - How to format the date in csv to accept by hive table

阅读更多关于 csv file to hive table using load data - How to format the date in csv to accept by hive table

I am using load data syntax to load a csv file to a table.The file is same format as hive accepts. But still after load data is issued, Last 2 columns returns null on select. 1750,651,'2013-03-11','2013-03-17' 1751,652,'2013-03-18','2013-03-24' 1752,653,'2013-03-25','2013-03-31' 1753,654,'2013-04-01','2013-04-07' create table dattable( DATANUM INT, ENTRYNUM BIGINT, START_DATE DATE, END_DATE DATE ) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ; LOAD DATA LOCAL INPATH '/path/dtatable.csv' OVERWRITE INTO TABLE dattable ; Select returns NULL values for the last 2 cols Other question was what if

Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

阅读更多关于 Difference between 'Stored as InputFormat, OutputFormat' and 'Stored as' in Hive

Issue when executing a show create table and then executing the resulting create table statement if the table is ORC. Using show create table , you get this: STORED AS INPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcInputFormat’ OUTPUTFORMAT ‘org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat’ But if you create the table with those clauses, you will then get the casting error when selecting. Error likes: Failed with exception java.io.IOException:java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.orc.OrcStruct cannot be cast to org.apache.hadoop.io.BinaryComparable To fix this, just

Why does all columns get created as string when I use OpenCSVSerde in Hive?

阅读更多关于 Why does all columns get created as string when I use OpenCSVSerde in Hive?

I am trying to create a table using the OpenCSVSerde and some integer and date columns. But the columns get converted to String. Is this an expected outcome? As a workaround, I do an explicit type-cast after this step (which makes the complete run slower) hive> create external table if not exists response(response_id int,lead_id int,creat_date date ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ('quoteChar' = '"', 'separatorChar' = '\,', 'serialization.encoding'='UTF-8', 'escapeChar' = '~') location '/prod/hive/db/response' TBLPROPERTIES ("serialization