Lucene Analyzer for Indexing and Searching

主宰稳场 提交于 2019-12-23 17:05:17

问题


I have a field that I am indexing with Lucene like so:

@Field(name="hungerState", index=Index.TOKENIZED, store=Store.YES)
public HungerState getHungerState() {

The possible values of this field are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY

When these values are indexed using the StandardAnalyzer, the terms end up as hungry, slightly since it tokenizes on punctuation and ignores the "not".

If I change the index to index=Index.UN_TOKENIZED, the indexed terms are HUNGRY, SLIGHTLY_HUNGRY, and NOT_HUNGRY, as expected.

My search API has 1 "search" method that constructs the Query like so:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), new StandardAnalyzer(Version.LUCENE_30));
parser.setDefaultOperater(QueryParser.AND_OPERATOR);
Query query = parser.parse(searchTerms);

This handles searches where searchTerms = "foo", which searches all fields returned by getSearchFields() on "foo", and also where searchTerms specifies fields and values to search (ie "hungerState:HUNGRY")

My problem is with the latter scenario. Since the query parser is using a StandardAnalyzer, searches for hungerState:SLIGHTLY_HUNGRY get parsed into hungerState:"slightly hungry" and searches for hungerState=NOT_HUNGRY get parsed into hungerState=hungry.

When the field is indexed using the StandardAnalyzer, I get unexpected results (searches for HUNGRY and NOT_HUNGRY return results for all 3 values). When the field is indexed as UN_TOKENIZED, I don't get any results since the query parser tokenizes the search string and makes it lowercase.

I've even tried specifying an Analyzer for indexing like KeywordAnalyzer, but it pretty much has no effect since the entire search string is analyzed with StandardAnalyzer every time.

Any advice would be appreciated. Thanks!


回答1:


You're using a standard analyzer for your query parser, so yes your query will be analyzed with a standard analyzer. Just switch to using a keyword analyzer:

MultiFieldQueryParser parser = new MultiFieldQueryParser(Version.LUCENE_30, getSearchFields(), 
          new KeywordAnalyzer(Version.LUCENE_30));

You may want to use a PerFieldAnalyzerWrapper if your other fields aren't keywords.



来源:https://stackoverflow.com/questions/7744129/lucene-analyzer-for-indexing-and-searching

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!