Let\'s say I have:
\"hits\": [
{
\"_index\": \"products\",
\"_type\": \"product\",
\"_id\": \"599c2b3fc991ee0a597034fa\",
\
You cannot do it in one query but it is fairly easy in two:
You can use mapping to get all the fields in your documents:
curl -XGET "http://localhost:9200/your_index/your_type/_mapping"
You can then use multiple Terms aggregation to get all the values of a field:
curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"field1Values": {
"terms": {
"field": "field1",
"size": 20
}
},
"field2Values": {
"terms": {
"field": "field2",
"size": 20
}
},
"field3Values": {
"terms": {
"field": "field3",
"size": 20
}
},
...
}
}'
This retrieve the top 20 most frequents values for each field.
This limit of 20 values is a restriction to prevent a huge response (if you have a few billion documents with a unique fields for instance). You can modify the "size" parameters of the terms aggregation to increase it. From your requirements I guess choosing something 10x larger than a rough estimate of the number of different values taken by each field should do the trick.
You can also do an intermediate query using the cardinality aggregation to get this actual value and then use it as the size of your term aggregation. Please note than cardinality is an estimate when it comes to large number so you may want to use cardinality * 2
.
curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"field1Cardinality": {
"cardinality": {
"field": "field1"
}
},
"field2Cardinality": {
"cardinality": {
"field": "field2"
}
},
"field3Cardinality": {
"cardinality": {
"field": "field3"
}
},
...
}
}'
The previous works if there is not so many different attributes. If there is, you should alter how the documents are stored to prevent a Mapping explosion,
Storing them like this:
{
"attributes":[
{
"name":"1",
"value":[
"a"
]
},
{
"name":"2",
"value":[
"b",
"c"
]
},
{
"name":"3",
"value":[
"d",
"e"
]
},
{
"name":"4",
"value":[
"f",
"g"
]
},
{
"name":"5",
"value":[
"h",
"i"
]
}
]
}
Would fix the problem and you will be able to use a term aggregation on "name" and then a sub terms aggregation on "value" to get what you want:
curl -XGET "http://localhost:9200/your_index/your_type/_search" -H 'Content-Type: application/json' -d'
{
"size": 0,
"aggs": {
"attributes": {
"terms": {
"field": "attributes.name",
"size": 1000
},
"aggs": {
"values": {
"terms": {
"field": "attributes.value",
"size": 100
}
}
}
}
}
}'
It requires to use a Nested mapping for attributes.