Combine full text with other index

前端 未结 1 1258
余生分开走
余生分开走 2020-12-20 07:20

I have a full text index and an index on a created date.

My query on the date alone returns a nice, small 44 records very quickly (within a second):

         


        
相关标签:
1条回答
  • 2020-12-20 08:00

    The main case here is that a "text" search result is generally taking precedence over other filter conditions in the query, and as such it becomes necessary to "first" obtain results from the "text" component, and then basically "scan" for other conditions in the document.

    This type of search can be difficult to optimise along with a "range" or any type of "inequality" match condition in conjuntion with the text search results, and is mostly due to how MongoDB handles this "special" index type.

    For a short demonstration, consider the following basic setup:

    db.texty.drop();
    
    db.texty.insert([
        { "a": "a", "text": "something" },
        { "a": "b", "text": "something" },
        { "a": "b", "text": "nothing much" },
        { "a": "c", "text": "something" }
    ])
    
    db.texty.createIndex({ "text": "text" })
    db.texty.createIndex({ "a": 1 })
    

    So if you wanted to look at this with a text search condition as well as a range consideration on the other field ( { "$lt": "c" } ), then you could handle as follows:

    db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } }).explain()
    

    With the explain output such as ( important part ):

               "winningPlan" : {
                        "stage" : "FETCH",
                        "filter" : {
                                "a" : {
                                        "$lt" : "c"
                                }
                        },
                        "inputStage" : {
                                "stage" : "TEXT",
                                "indexPrefix" : {
    
                                },
                                "indexName" : "text_text",
                                "parsedTextQuery" : {
                                        "terms" : [
                                                "someth"
                                        ],
                                        "negatedTerms" : [ ],
                                        "phrases" : [ ],
                                        "negatedPhrases" : [ ]
                                },
                                "inputStage" : {
                                        "stage" : "TEXT_MATCH",
                                        "inputStage" : {
                                                "stage" : "TEXT_OR",
                                                "inputStage" : {
                                                        "stage" : "IXSCAN",
                                                        "keyPattern" : {
                                                                "_fts" : "text",
                                                                "_ftsx" : 1
                                                        },
                                                        "indexName" : "text_text",
                                                        "isMultiKey" : true,
                                                        "isUnique" : false,
                                                        "isSparse" : false,
                                                        "isPartial" : false,
                                                        "indexVersion" : 1,
                                                        "direction" : "backward",
                                                        "indexBounds" : {
    
                                                        }
                                                }
                                        }
                                }
                        }
                },
    

    Which is basically saying "first get me the text results and then filter those results fetched by the other condition". So clearly only the "text" index is being used here and then all the results it returns are subsequently being filtered by examining the content.

    This is not optimal for two reasons, being that it may likely be that the data is best constrained by the "range" condition rather than the matches from the text search. Secondly, even though there is an index on the other data, it is not being used here for comparison. So rather the whole document is loaded for each result and the filter is tested.

    You might then consider a "compound" index format here, and it would seem initially logical that if the "range" is more specific to selection, then include that as the prefixed order of the indexed keys:

    db.texty.dropIndexes();
    db.texty.createIndex({ "a": 1, "text": "text" })
    

    But there is a catch here, since when you attempt to run the query again:

    db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } })
    

    It would result in an error:

    Error: error: { "waitedMS" : NumberLong(0), "ok" : 0, "errmsg" : "error processing query: ns=test.textyTree: $and\n a $lt \"c\"\n TEXT : query=something, language=english, caseSensitive=0, diacriticSensitive=0, tag=NULL\nSort: {}\nProj: {}\n planner returned error: failed to use text index to satisfy $text query (if text index is compound, are equality predicates given for all prefix fields?)", "code" : 2 }

    So even though that may seem "optimal", the way MongoDB processes the query ( and really index selection ) for the special "text" index, it is just not possible for this "exclusion" outside of the range to be possible.

    You can however perform an "equality" match on this in a very efficient way:

    db.texty.find({ "a": "b", "$text": { "$search": "something" } }).explain()
    

    With the explain output:

               "winningPlan" : {
                        "stage" : "TEXT",
                        "indexPrefix" : {
                                "a" : "b"
                        },
                        "indexName" : "a_1_text_text",
                        "parsedTextQuery" : {
                                "terms" : [
                                        "someth"
                                ],
                                "negatedTerms" : [ ],
                                "phrases" : [ ],
                                "negatedPhrases" : [ ]
                        },
                        "inputStage" : {
                                "stage" : "TEXT_MATCH",
                                "inputStage" : {
                                        "stage" : "TEXT_OR",
                                        "inputStage" : {
                                                "stage" : "IXSCAN",
                                                "keyPattern" : {
                                                        "a" : 1,
                                                        "_fts" : "text",
                                                        "_ftsx" : 1
                                                },
                                                "indexName" : "a_1_text_text",
                                                "isMultiKey" : true,
                                                "isUnique" : false,
                                                "isSparse" : false,
                                                "isPartial" : false,
                                                "indexVersion" : 1,
                                                "direction" : "backward",
                                                "indexBounds" : {
    
                                                }
                                        }
                                }
                        }
                },
    

    So the index is used and it can be shown to "pre-filter" the content provided to the text matching by the output of the other condition.

    If indeed you keep the "prefix" to the index as the "text" field(s) to search however:

    db.texty.dropIndexes();
    
    db.texty.createIndex({ "text": "text", "a": 1 })
    

    Then perform the search:

    db.texty.find({ "a": { "$lt": "c" }, "$text": { "$search": "something" } }).explain()
    

    Then you see a similar result to the above "equality" match:

                "winningPlan" : {
                        "stage" : "TEXT",
                        "indexPrefix" : {
    
                        },
                        "indexName" : "text_text_a_1",
                        "parsedTextQuery" : {
                                "terms" : [
                                        "someth"
                                ],
                                "negatedTerms" : [ ],
                                "phrases" : [ ],
                                "negatedPhrases" : [ ]
                        },
                        "inputStage" : {
                                "stage" : "TEXT_MATCH",
                                "inputStage" : {
                                        "stage" : "TEXT_OR",
                                        "filter" : {
                                                "a" : {
                                                        "$lt" : "c"
                                                }
                                        },
                                        "inputStage" : {
                                                "stage" : "IXSCAN",
                                                "keyPattern" : {
                                                        "_fts" : "text",
                                                        "_ftsx" : 1,
                                                        "a" : 1
                                                },
                                                "indexName" : "text_text_a_1",
                                                "isMultiKey" : true,
                                                "isUnique" : false,
                                                "isSparse" : false,
                                                "isPartial" : false,
                                                "indexVersion" : 1,
                                                "direction" : "backward",
                                                "indexBounds" : {
    
                                                }
                                        }
                                }
                        }
                },
    

    The big differnce here from the first attempt being where filter is placed in the processing chain, indicating that whilst not a "prefix" match ( which is most optimal ), the content is indeed being scanned off of the index "before" being sent to the "text" stage.

    So it is "pre-filtered" but not of course in the most optimal way, and this is due to the very nature of how the "text" index is used. So if you just considered the plain range on an index by itself:

    db.texty.createIndex({ "a": 1 })
    db.texty.find({ "a": { "$lt": "c" } }).explain()
    

    Then the explain output:

                "winningPlan" : {
                        "stage" : "FETCH",
                        "inputStage" : {
                                "stage" : "IXSCAN",
                                "keyPattern" : {
                                        "a" : 1
                                },
                                "indexName" : "a_1",
                                "isMultiKey" : false,
                                "isUnique" : false,
                                "isSparse" : false,
                                "isPartial" : false,
                                "indexVersion" : 1,
                                "direction" : "forward",
                                "indexBounds" : {
                                        "a" : [
                                                "[\"\", \"c\")"
                                        ]
                                }
                        }
                },
    

    Then that at least got the indexBounds to consider and only looked at that portion of the index that fell within those bounds.

    So that's the differences here. Using a "compound" structure should save you some iteration cycles here by being able to narrow down the selection, but it still must scan all index entries to filter, and must of course not be the "prefix" element in the index unless you can use an equality match on it.

    Without a compound structure in the index, you are always returning the text results "first", and then applying any other conditions to those results. Also it is not possible to "combine/intersect" the results from looking at a "text" index and a "normal" index due to the query engine handling. That is generally not going to be the optimal approach, so planning for considerations is important.

    In short, ideally compound with an "equality" match "prefix", and if not then include in the index "after" the text definition.

    0 讨论(0)
提交回复
热议问题