Finding exact match using Lucene search API

后端未结

关注

 5  1566

I\'m working on a company search API using Lucene. My Lucene company index has got 2 companies: 1.Abigail Adams National Bancorp, Inc. 2.National Bancorp

If the user typ

相关标签:

5条回答

迷失自我

2021-02-09 07:29

You can use KeywordAnalyzer to index and search on this field. Keyword Analyzer will generate only one token for the entire string.

0 讨论(0)
发布评论:

提交评论
- 加载中...
后悔当初

2021-02-09 07:38

I googled a lot with no help for the same problem. After scratching my head for a while I found the solution. Search the string within double quotes, that will solve your problem.

National Bancorp will return both #1 and #2 but "National Bancorp" will return only #2.

0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2021-02-09 07:39
You may want to reconsider your requirements, depending on whether or not I correctly understood your question. Please bear with me if I did misunderstand you.

Just a little food for thought:
- If you only want exact matches returned, then why are you searching in the first place?
- Are you sure that the user expects exact matches? I typically search assuming that the search engine will accommodate missing words.
- Suppose the user searched for National Bank but National Bank was no longer in your index. Would you still want Abigail Adams National Bancorp, Inc to be excluded from the results simply because it was not an exact match?
In light of this, I would suggest you continue to present all possible matches (exact or not) to the user and let them decide for themselves which is most appropriate for them. I say this simply because you may not be thinking the same way as all of your users. Lucene will take care of making sure the closest matches rank highest in the results, helping them make quicker choices.
0 讨论(0)
发布评论:

提交评论
- 加载中...
你的背包

2021-02-09 07:46

This is something that may warrant the use of the shingle filter. This filter groups multiple words together. For example, Abigail Adams National Bancorp with a ShingleFilter of 3 tokens would produce (assuming a simple WhitespaceAnalyzer) [Abigail], [Abigail Adams], [Abigail Adams National], [Adams National Bancorp], [Adams National], [Adams], [National], [National Bancorp] and [Bancorp].

If a user the queries for National Bancorp, you will get an exact match on National Bancorp itself, and a lower scored exact match on Abigail Adams National Bancorp (lower scored because this one has much more tokens in the field, thus lowering the idf). I think it makes sense to return both documents on such a query.

You may want to apply the shingle filter at query time as well, depending on the use case.

0 讨论(0)
发布评论:

提交评论
- 加载中...
轻奢々

2021-02-09 07:47
I have the same requirements of exact matching. I have used queryBuilder of org.hibernate.search.query.dsl and the query is:
```
query = queryBuilder.phrase().withSlop(0).onField(field)
                        .sentence(searchTerm).createQuery();
```
Its working for me.
0 讨论(0)
发布评论:

提交评论
- 加载中...