Using MultiFieldQueryParser

自古美人都是妖i 提交于 2019-12-23 02:40:38

问题


Am using MultiFieldQueryParser for parsing strings like a.a., b.b., etc But after parsing, its removing the dots in the string. What am i missing here?

Thanks.


回答1:


I'm not sure the MultiFieldQueryParser does what you think it does. Also...I'm not sure I know what you're trying to do.

I do know that with any query parser, strings like 'a.a.' and 'b.b.' will have the periods stripped out because, at least with the default Analyzer, all punctuation is treated as white space.

As far as the MultiFieldQueryParser goes, that's just a QueryParser that allows you to specify multiple default fields to search. So with the query

title:"Of Mice and Men" "John Steinbeck"

The string "John Steinbeck" will be looked for in all of your default fields whereas "Of Mice and Men" will only be searched for in the title field.




回答2:


What analyzer is your parser using? If it's StopAnalyzer then the dot could be a stop word and is thus ignored. Same thing if it's StandardAnalyzer which cleans up input (includes removing dots).




回答3:


(Repeating my answer from the dupe. One of these should be deleted).

The StandardAnalyzer specifically handles acronyms, and converts C.F.A. (for example) to cfa. This means you should be able to do the search, as long as you make sure you use the same analyzer for the indexing and for the query parsing.

I would suggest you run some more basic test cases to eliminate other factors. Try to user an ordinary QueryParser instead of a multi-field one.

Here's some code I wrote to play with the StandardAnalyzer:

StringReader testReader = new StringReader("C.F.A. C.F.A word");
StandardAnalyzer analyzer = new StandardAnalyzer();
TokenStream tokenStream = analyzer.tokenStream("title", testReader);
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());
System.out.println(tokenStream.next());

The output for this, by the way was:

(cfa,0,6,type=<ACRONYM>)
(c.f.a,7,12,type=<HOST>)
(word,13,17,type=<ALPHANUM>)

Note, for example, that if the acronym doesn't end with a dot then the analyzer assumes it's an internet host name, so searching for "C.F.A" will not match "C.F.A." in the text.



来源:https://stackoverflow.com/questions/652956/using-multifieldqueryparser

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!