问题
I'm trying to count documents with unique nested field value (and next, the documents itself also). Looks like getting the unique documents works.
But when I'm trying to execute a request for count
, I'm getting an error as follows:
Suppressed: org.elasticsearch.client.ResponseException: method [POST], host [http://localhost:9200], URI [/package/_count?ignore_throttled=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true], status line [HTTP/1.1 400 Bad Request] {"error":{"root_cause":[{"type":"parsing_exception","reason":"request does not support [collapse]","line":1,"col":216}],"type":"parsing_exception","reason":"request does not support [collapse]","line":1,"col":216},"status":400}
The code:
BoolQueryBuilder innerTemplNestedBuilder = QueryBuilders.boolQuery();
NestedQueryBuilder templatesNestedQuery = QueryBuilders.nestedQuery("attachment", innerTemplNestedBuilder, ScoreMode.None);
BoolQueryBuilder mainQueryBuilder = QueryBuilders.boolQuery().must(templatesNestedQuery);
if (!isEmpty(templateName)) {
innerTemplNestedBuilder.filter(QueryBuilders.termQuery("attachment.name", templateName));
}
SearchSourceBuilder searchSourceBuilder = SearchSourceBuilder.searchSource()
.collapse(new CollapseBuilder("attachment.uuid"))
.query(mainQueryBuilder);
// NEXT LINE CAUSE ERROR
long count = client.count(new CountRequest("package").source(searchSourceBuilder), RequestOptions.DEFAULT).getCount(); <<<<<<<<<< ERROR HERE
// THIS WORKS
SearchResponse searchResponse = client.search(
new SearchRequest(
new String[] {"package"},
searchSourceBuilder.timeout(new TimeValue(20, TimeUnit.SECONDS)).from(offset).size(limit)
).indices("package").searchType(SearchType.DFS_QUERY_THEN_FETCH),
RequestOptions.DEFAULT
);
return ....;
The overall intention of approach is to get a portion of documents and the number of all such documents. May be there is another approach for this need already exists. If I'm trying to get count
using aggregations
and cardinality
- I'm getting the zero result and it looks like it doesn't work on the nested fields.
Count request:
{
"query": {
"bool": {
"must": [
{
"nested": {
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"path": "attachment",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"collapse": {
"field": "attachment.uuid"
}
}
How mapping created:
curl -X DELETE "localhost:9200/package?pretty"
curl -X PUT "localhost:9200/package?include_type_name=true&pretty" -H 'Content-Type: application/json' -d '{
"settings" : {
"number_of_shards" : 1,
"number_of_replicas" : 1
}}'
curl -X PUT "localhost:9200/package/_mappings?pretty" -H 'Content-Type: application/json' -d'
{
"dynamic": false,
"properties" : {
"attachment": {
"type": "nested",
"properties": {
"uuid" : { "type" : "keyword" },
"name" : { "type" : "text" }
}
},
"uuid" : {
"type" : "keyword"
}
}
}
'
result query generated by code should be something like this:
curl -X POST "localhost:9200/package/_count?&pretty" -H 'Content-Type: application/json' -d' { "query" :
{
"bool": {
"must": [
{
"nested": {
"query": {
"bool": {
"adjust_pure_negative": true,
"boost": 1.0
}
},
"path": "attachment",
"ignore_unmapped": false,
"score_mode": "none",
"boost": 1.0
}
}
],
"adjust_pure_negative": true,
"boost": 1.0
}
},
"collapse": {
"field": "attachment.uuid"
}
}'
回答1:
Collapsing can only be used in the _search
context, not in _count
.
Secondly, what does your query even do? You've got a lot of redundant parameters there like boost:1
etc. You might as well say:
POST /package/_count?&pretty
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "attachment",
"query": {
"match_all": {}
}
}
}
]
}
}
}
which does not really do anything :)
To answer your original question of "counting documents with unique nested field value",
let's imagine 3 documents, 2 of which have the same attachment.uuid
value:
[
{
"attachment":{
"uuid":"04144e14-62c3-11ea-bc55-0242ac130003"
}
},
{
"attachment":{
"uuid":"04144e14-62c3-11ea-bc55-0242ac130003"
}
},
{
"attachment":{
"uuid":"100b9632-62c3-11ea-bc55-0242ac130003"
}
}
]
To get the terms
breakdown of the uuid
s, run
GET package/_search
{
"size": 0,
"aggs": {
"nested_uniques": {
"nested": {
"path": "attachment"
},
"aggs": {
"subagg": {
"terms": {
"field": "attachment.uuid"
}
}
}
}
}
}
which yields
...
{
"aggregations":{
"nested_uniques":{
"doc_count":3,
"subagg":{
"doc_count_error_upper_bound":0,
"sum_other_doc_count":0,
"buckets":[
{
"key":"04144e14-62c3-11ea-bc55-0242ac130003",
"doc_count":2
},
{
"key":"100b9632-62c3-11ea-bc55-0242ac130003",
"doc_count":1
}
]
}
}
}
}
To get the the parent doc count of unique nested fields, we're gonna have to get slightly more clever:
GET package/_search
{
"size": 0,
"aggs": {
"nested_uniques": {
"nested": {
"path": "attachment"
},
"aggs": {
"scripted_uniques": {
"scripted_metric": {
"init_script": "state.my_map = [:];",
"map_script": """
if (doc.containsKey('attachment.uuid')) {
state.my_map[doc['attachment.uuid'].value.toString()] = 1;
}
""",
"combine_script": """
def sum = 0;
for (c in state.my_map.entrySet()) {
sum += 1
}
return sum
""",
"reduce_script": """
def sum = 0;
for (agg in states) {
sum += agg;
}
return sum;
"""
}
}
}
}
}
}
which returns
...
{
"aggregations":{
"nested_uniques":{
"doc_count":3,
"scripted_uniques":{
"value":2
}
}
}
}
and this scripted_uniques: 2
is exactly what you're after.
Note: I solved this use case using nested scripted metric aggs but if any of you know of a cleaner approach, I'm more than happy to learn it!
来源:https://stackoverflow.com/questions/60566664/how-to-count-a-number-of-unique-documents-by-a-nested-field-in-elasticsearch