using JSON-SerDe in Hive tables

后端未结

关注

 4  1288

I\'m trying JSON-SerDe from below link http://code.google.com/p/hive-json-serde/wiki/GettingStarted.

         CREATE TABLE my_table (field1 string, field2


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  感动是毒        
                
              
                            
                2021-01-02 22:05
              
            
            
                                                                       
I solved similar problem -


I took the jar from -
[http://www.congiu.net/hive-json-serde/1.3.8/hdp23/json-serde-1.3.8-jar-with-dependencies.jar]
Run the command in Hive CLI - add jar /path/to/jar
Created table using -


create table messages (
    id int,
    creation_date string,
    text string,
    loggedInUser STRUCT<id:INT, name: STRING>
)
row format serde "org.openx.data.jsonserde.JsonSerDe";



This is my JSON data -


{"id": 1,"creation_date": "2020-03-01","text": "I am on cotroller","loggedInUser":{"id":1,"name":"API"}}
{"id": 2,"creation_date": "2020-04-01","text": "I am on service","loggedInUser":{"id":1,"name":"API"}}



Loaded data in table using -


LOAD DATA LOCAL INPATH '${env:HOME}/path-to-json'
OVERWRITE INTO TABLE messages;



select * from messages;


1   2020-03-01    I am on cotroller   {"id":1,"name:"API"}
2   2020-04-01    I am on service     {"id":1,"name:"API"}

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  后悔当初        
                
              
                            
                2021-01-02 22:16
              
            
            
                                                                       
for json parsing
based on cwiki/confluence we need follow some steps 


need to download hive-hcatalog-core.jar
hive> add jar /path/hive-hcatalog-core.jar
create table tablename(colname1 datatype,.....) row formatserde'org.apache.hive.hcatalog.data.JsonSerDe' stored as ORCFILE;
colname in creating table and colname in test.json must be same if not it    will show null values
Hope it wil  helpfull 

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  刺人心        
                
              
                            
                2021-01-02 22:21
              
            
            
                                                                       

First of all you have to validate your json file on http://jsonlint.com/
after that make your file as one row per line and remove the [ ]. the comma at the end of the line is mandatory.

[{"field1":"data1","field2":100,"field3":"more data1","field4":123.001},
{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}, 
{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}, 
{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}]
In my test I added hive-json-serde-0.2.jar from hadoop cluster , I think hive-json-serde-0.1.jar should be ok.

ADD JAR hive-json-serde-0.2.jar;
Create your table

CREATE TABLE my_table (field1 string, field2 int, field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde' ;
Load your Json data file ,here I load it from hadoop cluster not from local

LOAD DATA INPATH  'Test2.json' INTO TABLE my_table;


My test
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  爱一瞬间的悲伤        
                
              
                            
                2021-01-02 22:27
              
            
            
                                                                       
A bit hard to tell what's going on without the logs (see Getting Started) in case of doubt. Just a quick thought - can you try if it works with WITH SERDEPROPERTIESas so:

CREATE EXTERNAL TABLE my_table (field1 string, field2 int, 
                                field3 string, field4 double)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.JsonSerde'
WITH SERDEPROPERTIES (
  "field1"="$.field1",
  "field2"="$.field2",
  "field3"="$.field3",
  "field4"="$.field4" 
);


There is also a fork you might want to give a try from ThinkBigAnalytics.

UPDATE: Turns out the input in Test.json is invalid JSON hence the records get collapsed.

See answer https://stackoverflow.com/a/11707993/396567 for further details.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复