Process large JSON stream with jq

前端未结
关注
 2  821
生来不讨喜 2021-01-12 15:21
I get a very large JSON stream (several GB) from curl and try to process it with jq.
The relevant output I want to parse with jq

      
      
        
          2条回答        

        
                    
            
            
                         
                
              
              
                
                   再見小時候
                                             
                
                
                (楼主)
            
              
              
                2021-01-12 15:43
              

            
            
                        
To get:

{"key1": "row1", "key2": "row1"}
{"key1": "row2", "key2": "row2"}


From:

{
  "results":[
    {
      "columns": ["n"],
      "data": [    
        {"row": [{"key1": "row1", "key2": "row1"}], "meta": [{"key": "value"}]},
        {"row": [{"key1": "row2", "key2": "row2"}], "meta": [{"key": "value"}]}
      ]
    }
  ],
  "errors": []
}


Do the following, which is equivalent to jq -c '.results[].data[].row[]', but using streaming:

jq -cn --stream 'fromstream(1|truncate_stream(inputs | select(.[0][0] == "results" and .[0][2] == "data" and .[0][4] == "row") | del(.[0][0:5])))'


What this does is:


Turn the JSON into a stream (with --stream)
Select the path .results[].data[].row[] (with select(.[0][0] == "results" and .[0][2] == "data" and .[0][4] == "row")
Discard those initial parts of the path, like "results",0,"data",0,"row" (with del(.[0][0:5]))
And finally turn the resulting jq stream back into the expected JSON with the fromstream(1|truncate_stream(…)) pattern from the jq FAQ


For example:

echo '
  {
    "results":[
      {
        "columns": ["n"],
        "data": [    
          {"row": [{"key1": "row1", "key2": "row1"}], "meta": [{"key": "value"}]},
          {"row": [{"key1": "row2", "key2": "row2"}], "meta": [{"key": "value"}]}
        ]
      }
    ],
    "errors": []
  }
' | jq -cn --stream '
  fromstream(1|truncate_stream(
    inputs | select(
      .[0][0] == "results" and 
      .[0][2] == "data" and 
      .[0][4] == "row"
    ) | del(.[0][0:5])
  ))'


Produces the desired output.
    
             
                                                        
            
            
              
                
                0
              
                   
                
               讨论(0)
              
                                                  
              
              
                          
             
       
          
              
                                       
     查看其它2个回答


            
                         
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
                              			
        
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复