ElasticSearch - get all available filters (aggregate) from index

前端 未结 1 1554
有刺的猬
有刺的猬 2021-02-04 19:40

Let\'s say I have:

\"hits\": [
      {
        \"_index\": \"products\",
        \"_type\": \"product\",
        \"_id\": \"599c2b3fc991ee0a597034fa\",
        \         


        
相关标签:
1条回答
  • 2021-02-04 20:11

    You cannot do it in one query but it is fairly easy in two:

    Retrieving the list of attributes

    You can use mapping to get all the fields in your documents:

    curl -XGET "http://localhost:9200/your_index/your_type/_mapping"
    

    Retrieving their values

    You can then use multiple Terms aggregation to get all the values of a field:

    curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
    {
      "size": 0,
      "aggs": {
        "field1Values": {
          "terms": {
            "field": "field1",
            "size": 20
          }
        },
        "field2Values": {
          "terms": {
            "field": "field2",
            "size": 20
          }
        },
        "field3Values": {
          "terms": {
            "field": "field3",
            "size": 20
          }
        },
        ...
      }
    }'
    

    This retrieve the top 20 most frequents values for each field.

    This limit of 20 values is a restriction to prevent a huge response (if you have a few billion documents with a unique fields for instance). You can modify the "size" parameters of the terms aggregation to increase it. From your requirements I guess choosing something 10x larger than a rough estimate of the number of different values taken by each field should do the trick.

    How to handle huge cardinality on values

    You can also do an intermediate query using the cardinality aggregation to get this actual value and then use it as the size of your term aggregation. Please note than cardinality is an estimate when it comes to large number so you may want to use cardinality * 2.

    curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
    {
      "size": 0,
      "aggs": {
        "field1Cardinality": {
          "cardinality": {
            "field": "field1"
          }
        },
        "field2Cardinality": {
          "cardinality": {
            "field": "field2"
          }
        },
        "field3Cardinality": {
          "cardinality": {
            "field": "field3"
          }
        },
        ...
      }
    }'
    

    How to handle huge cardinality on values

    The previous works if there is not so many different attributes. If there is, you should alter how the documents are stored to prevent a Mapping explosion,

    Storing them like this:

    {
        "attributes":[
            {
                "name":"1",
                "value":[
                    "a"
                ]
            },
            {
                "name":"2",
                "value":[
                    "b",
                    "c"
                ]
            },
            {
                "name":"3",
                "value":[
                    "d",
                    "e"
                ]
            },
            {
                "name":"4",
                "value":[
                    "f",
                    "g"
                ]
            },
            {
                "name":"5",
                "value":[
                    "h",
                    "i"
                ]
            }
        ]
    }
    

    Would fix the problem and you will be able to use a term aggregation on "name" and then a sub terms aggregation on "value" to get what you want:

    curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
    {
      "size": 0,
      "aggs": {
        "attributes": {
          "terms": {
            "field": "attributes.name",
            "size": 1000
          },
          "aggs": {
            "values": {
              "terms": {
                "field": "attributes.value",
                "size": 100
              }
            }
          }
        }
      }
    }'
    

    It requires to use a Nested mapping for attributes.

    0 讨论(0)
提交回复
热议问题