问题
I've the following code
static class TaggerAnalyzer extends Analyzer {
@Override
protected TokenStreamComponents createComponents(String s, Reader reader) {
SynonymMap.Builder builder = new SynonymMap.Builder(true);
builder.add(new CharsRef("al"), new CharsRef("americanleague"), true);
builder.add(new CharsRef("al"), new CharsRef("a.l."), true);
builder.add(new CharsRef("nba"), new CharsRef("national" + SynonymMap.WORD_SEPARATOR + "basketball" + SynonymMap.WORD_SEPARATOR + "association"), true);
SynonymMap mySynonymMap = null;
try {
mySynonymMap = builder.build();
} catch (IOException e) {
e.printStackTrace();
}
Tokenizer source = new ClassicTokenizer(Version.LUCENE_40, reader);
TokenStream filter = new StandardFilter(Version.LUCENE_40, source);
filter = new LowerCaseFilter(Version.LUCENE_40, filter);
filter = new SynonymFilter(filter, mySynonymMap, true);
return new TokenStreamComponents(source, filter);
}
}
And I'm running some test, so far, everything went ok until I figured out this scenario.
String title = "Very short title at a.l. bla bla"
Assert.assertTrue(TagUtil.evaluate(memoryIndex,"americanleague"));
Assert.assertTrue(TagUtil.evaluate(memoryIndex,"al"));
I was expecting that both cases ran successfully, but americanleague didn't match with "a.l." besides both "a.l." and "americanleague" are "al" synonyms.
So, what do I do? I don't want to add all combinations to the Map. Thanks
回答1:
I believe you have your arguments to builder.add
backwards. Try:
builder.add(new CharsRef("americanleague"), new CharsRef("al"), true);
builder.add(new CharsRef("a.l."), new CharsRef("al"), true);
builder.add(new CharsRef("national" + SynonymMap.WORD_SEPARATOR + "basketball" + SynonymMap.WORD_SEPARATOR + "association"), new CharsRef("nba"), true);
The SynonymFilter
just maps from the first arg (input) to the second arg (output), rather than the other way around. So you have rules to translate "al" to two different synonyms, but none that do anything to inputs of "a.l." or "americanleague".
来源:https://stackoverflow.com/questions/22078669/build-lucene-synonyms