ElasticSearch - get all available filters (aggregate) from index

前端未结

关注

 1  1558

Let\'s say I have:

\"hits\": [
      {
        \"_index\": \"products\",
        \"_type\": \"product\",
        \"_id\": \"599c2b3fc991ee0a597034fa\",
        \


                      
              相关标签:


      
      
        
          1条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2021-02-04 20:11
              
            
            
                                                                       
You cannot do it in one query but it is fairly easy in two:

Retrieving the list of attributes

You can use mapping to get all the fields in your documents:

curl -XGET "http://localhost:9200/your_index/your_type/_mapping"


Retrieving their values

You can then use multiple Terms aggregation to get all the values of a field:

curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "field1Values": {
      "terms": {
        "field": "field1",
        "size": 20
      }
    },
    "field2Values": {
      "terms": {
        "field": "field2",
        "size": 20
      }
    },
    "field3Values": {
      "terms": {
        "field": "field3",
        "size": 20
      }
    },
    ...
  }
}'


This retrieve the top 20 most frequents values for each field.

This limit of 20 values is a restriction to prevent a huge response (if you have a few billion documents with a unique fields for instance). You can modify the "size" parameters of the terms aggregation to increase it. From your requirements I guess choosing something 10x larger than a rough estimate of the number of different values taken by each field should do the trick.

How to handle huge cardinality on values

You can also do an intermediate query using the cardinality aggregation to get this actual value and then use it as the size of your term aggregation. Please note than cardinality is an estimate when it comes to large number so you may want to use cardinality * 2.

curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "field1Cardinality": {
      "cardinality": {
        "field": "field1"
      }
    },
    "field2Cardinality": {
      "cardinality": {
        "field": "field2"
      }
    },
    "field3Cardinality": {
      "cardinality": {
        "field": "field3"
      }
    },
    ...
  }
}'


How to handle huge cardinality on values

The previous works if there is not so many different attributes.
If there is, you should alter how the documents are stored to prevent a Mapping explosion,

Storing them like this:

{
    "attributes":[
        {
            "name":"1",
            "value":[
                "a"
            ]
        },
        {
            "name":"2",
            "value":[
                "b",
                "c"
            ]
        },
        {
            "name":"3",
            "value":[
                "d",
                "e"
            ]
        },
        {
            "name":"4",
            "value":[
                "f",
                "g"
            ]
        },
        {
            "name":"5",
            "value":[
                "h",
                "i"
            ]
        }
    ]
}


Would fix the problem and you will be able to use a term aggregation on "name" and then a sub terms aggregation on "value" to get what you want:

curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
{
  "size": 0,
  "aggs": {
    "attributes": {
      "terms": {
        "field": "attributes.name",
        "size": 1000
      },
      "aggs": {
        "values": {
          "terms": {
            "field": "attributes.value",
            "size": 100
          }
        }
      }
    }
  }
}'


It requires to use a Nested mapping for attributes.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复