Stemming does not work properly for MongoDB text index

后端 未结 3 1370
渐次进展
渐次进展 2021-01-20 10:23

I am trying to use full text search feature of MongoDB and observing some unexpected behavior. The problem is related to \"stemming\" aspect of the text indexing feature. T

相关标签:
3条回答
  • Michael,

    The "language" field (if present) allows each document to override the
    language in which the stemming of words would be done. I think, as
    you specified to MongoDB a language which it didn't recognize ("ENG"),
    it was unable to stem the words at all. As others pointed out, you can use the
    language_override option to specify that MongoDB should be using some
    other field for this purpose (say "lang") and not the default one ("language").

    Below is a nice quote (about full text indexing and searching) which
    is exactly related to your issue. It is taken from this book.

    "MongoDB: The Definitive Guide, 2nd Edition"

    Searching in Other Languages

    When a document is inserted (or the index is first created), MongoDB looks at the indexes fields and stems each word, reducing it to an essential unit. However, different languages stem words in different ways, so you must specify what language the index or document is. Thus, text-type indexes allow a "default_language" option to be specified, which defaults to "english" but can be set to a number of other languages (see the online documentation for an up-to-date list). For example, to create a French-language index, we could say:

    > db.users.ensureIndex({"profil" : "text", "interets" : "text"}, {"default_language" : "french"})

    Then French would be used for stemming, unless otherwise specified. You can, on a per-document basis, specify another stemming language by having a "language" field that describes the document’s language:

    > db.users.insert({"username" : "swedishChef", "profile" : "Bork de bork", language : "swedish"})

    What the book does not mention (at least this page of it doesn't) is that
    one can use the language_override option to specify that MongoDB
    should be using some other field for this purpose (say "lang") and
    not the default one ("language").

    0 讨论(0)
  • 2021-01-20 11:02

    After a fair amount of experimenting and scratching my head I discovered the reason for this behavior. It turned out that the documents in the collection in question had attribute 'language'. Apparently the presence and the value of that attribute made these documents non-searchable. (The value happened to be 'ENG'. It is possible that changing it to 'eng' would make this document searchable again. The field, however, served a completely different purpose). After I renamed the field to 'lang' I was able to find the document containing the word "dogs" by searching for "dog" or "dogs".

    I wonder whether this is expected behavior of MongoDB - that the presence of language attribute in the document would affect the text search.

    0 讨论(0)
  • 2021-01-20 11:12

    In http://docs.mongodb.org/manual/tutorial/specify-language-for-text-index/ take a look at the language_override option when setting up the index. It allows you to change the name of the field that should be used to define the language of the text search. That way you can leave the "language" property for your application's use, and call it something else (e.g. searchlang or something like that).

    0 讨论(0)
提交回复
热议问题