Finding exact match using Lucene search API

后端 未结 5 1552
失恋的感觉
失恋的感觉 2021-02-09 07:06

I\'m working on a company search API using Lucene. My Lucene company index has got 2 companies: 1.Abigail Adams National Bancorp, Inc. 2.National Bancorp

If the user typ

相关标签:
5条回答
  • 2021-02-09 07:29

    You can use KeywordAnalyzer to index and search on this field. Keyword Analyzer will generate only one token for the entire string.

    0 讨论(0)
  • 2021-02-09 07:38

    I googled a lot with no help for the same problem. After scratching my head for a while I found the solution. Search the string within double quotes, that will solve your problem.

    National Bancorp will return both #1 and #2 but "National Bancorp" will return only #2.

    0 讨论(0)
  • 2021-02-09 07:39

    You may want to reconsider your requirements, depending on whether or not I correctly understood your question. Please bear with me if I did misunderstand you.

    Just a little food for thought:

    • If you only want exact matches returned, then why are you searching in the first place?

    • Are you sure that the user expects exact matches? I typically search assuming that the search engine will accommodate missing words.

    • Suppose the user searched for National Bank but National Bank was no longer in your index. Would you still want Abigail Adams National Bancorp, Inc to be excluded from the results simply because it was not an exact match?

    In light of this, I would suggest you continue to present all possible matches (exact or not) to the user and let them decide for themselves which is most appropriate for them. I say this simply because you may not be thinking the same way as all of your users. Lucene will take care of making sure the closest matches rank highest in the results, helping them make quicker choices.

    0 讨论(0)
  • 2021-02-09 07:46

    This is something that may warrant the use of the shingle filter. This filter groups multiple words together. For example, Abigail Adams National Bancorp with a ShingleFilter of 3 tokens would produce (assuming a simple WhitespaceAnalyzer) [Abigail], [Abigail Adams], [Abigail Adams National], [Adams National Bancorp], [Adams National], [Adams], [National], [National Bancorp] and [Bancorp].

    If a user the queries for National Bancorp, you will get an exact match on National Bancorp itself, and a lower scored exact match on Abigail Adams National Bancorp (lower scored because this one has much more tokens in the field, thus lowering the idf). I think it makes sense to return both documents on such a query.

    You may want to apply the shingle filter at query time as well, depending on the use case.

    0 讨论(0)
  • 2021-02-09 07:47

    I have the same requirements of exact matching. I have used queryBuilder of org.hibernate.search.query.dsl and the query is:

    query = queryBuilder.phrase().withSlop(0).onField(field)
                            .sentence(searchTerm).createQuery();
    

    Its working for me.

    0 讨论(0)
提交回复
热议问题