问题
I've implemented Lucenet.NET on my site, using it to index my products which are theatre shows, tours and attractions around London.
I want to implement a "Did you mean?" feature for when users misspell product names that takes the whole product titles into account and not just single words. For example,
If the user typed:
Lodnon Eye
I would like to auto-suggest:
London London Eye
I assume I nead to have the analyzer index the titles as if they are a single entity, so that SpellChecker can nearest-match on the phrase, as well as the individual words.
How would I do this?
回答1:
i've just recently implemented a phrase autosuggest system in lucene.net.
basically, the java version of lucene has a shinglefilter in one of the contrib folders which breaks down a sentence into all possible phrase combinations. Unfortunately lucene.nets contrib filters aren't quite there yet and so we don't have a shingle filter.
but, a lucene index written in java can be read by lucene.net as long as the versions are the same. so what i did was the following :
created a spell index in lucene.net using the spellcheck.IndexDictionary method as laid out in the "did you mean" section of jake scotts link. please note that only creates a spelling index of single words, not phrases.
i then created a java app that uses the shingle filter to create phrases of the text i'm searching and saves it in a temporary index.
i then wrote another method in dotnet to open this temporary index and add each of the phrases as a line or document into my spelling index that already contains the single words. the trick is to make sure the documents you're adding have the same form as the rest of the spell documents, so i ripped out the methods used in the spellchecker code in the lucene.net project and edited those.
once you've done that you can call the spellcheck.suggestsimilar method and pass it a misspelled phrase and it will return you a valid suggestion.
回答2:
There is a excellent blog series here:
- Lucene.NET
- Introduction to Lucene
- Indexing basics
- Search basics
- Did you mean..
- Faceted Search
- Class Reference
I have also found another project called SimpleLucene which you can use to maintain your lucene indexes whenever you need to update or delete a document. Read about it here
回答3:
This is probably not the best solution and I definitely would use the answer suggested by spaceman but here is another possible solution. Use the KeywordAnalyzer or the KeywordTonenizer on each title, this will not break down the title into separate tokens but keep it as one token. Using the SuggestSimilar method would return the whole title as suggestions.
来源:https://stackoverflow.com/questions/3118974/lucene-net-spellchecker-multi-word-phrase-based-auto-suggest