Lucene - Wildcards in phrases

前端未结

关注

 7  942

小鲜肉

I am currently attempting to use Lucene to search data populated in an index.

I can match on exact phrases by enclosing it in brackets (i.e. \"Processing Documents\"

相关标签:

7条回答

刺人心

2021-01-04 11:57

Lucene 2.9 has ComplexPhraseQueryParser which can handle wildcards in phrases.

0 讨论(0)
发布评论:

提交评论
- 加载中...
小鲜肉

2021-01-04 12:03

Use a SpanNearQuery with a slop of 0.

Unfortunately there's no SpanWildcardQuery in Lucene.Net. Either you'll need to use SpanMultiTermQueryWrapper, or with little effort you can convert the java version to C#.

0 讨论(0)
发布评论:

提交评论
- 加载中...
予麋鹿

2021-01-04 12:04

Another alternative is to use NGrams and specifically the EdgeNGram. http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.EdgeNGramFilterFactory

This will create indexes for ngrams or parts of words. Documents, with a min ngram size of 5 and max ngram size of 8, would index: Docum Docume Document Documents

There is a bit of a tradeoff for index size and time. One of the Solr books quotes as a rough guide: Indexing takes 10 times longer Uses 5 times more disk space Creates 6 times more distinct terms.

However, the EdgeNGram will do better than that.

You do need to make sure that you don't submit wildcard character in your queries. As you aren't doing a wildcard search, you are matching a search term on ngrams(parts of words).

0 讨论(0)
发布评论:

提交评论
- 加载中...
醉梦人生

2021-01-04 12:05
I was also looking for the same thing and what i found is PrefixQuery gives u a combination of some thing like this "Processing Document*".But the thing is your field which you are searching for should be untokenized and store it in lowercase (reason for so since it is untokenized indexer wont save your field values in lowercase) for this to work.Here is code for PrefixQuery which worked for me :-
```
List<SearchResult> results = new List<SearchResult>();
Lucene.Net.Store.Directory searchDir = FSDirectory.GetDirectory(this._indexLocation, false);
IndexSearcher searcher = new IndexSearcher( searchDir );
Hits hits;

BooleanQuery query = new BooleanQuery();
query.Add(new PrefixQuery(new Term(FILE_NAME_KEY, keyWords.ToLower())), BooleanClause.Occur.MUST);
hits = searcher.Search(query);
this.FillResults(hits, results);
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
礼貌的吻别

2021-01-04 12:06

It seems that the default QueryParser cannot handle this. You can probably create a custom QueryParser for wildcards in phrases. If your example is representative, stemming may solve your problem. Please read the documentation for PorterStemFilter to see whether it fits.

0 讨论(0)
发布评论:

提交评论
- 加载中...
不要未来只要你来

2021-01-04 12:10

What you're looking for is FuzzyQuery which allows one to search for results with similar words based on Levenshtein distance. Alternatively you may also want to consider using slop of PhraseQuery (also available in MultiPhraseQuery) if the order of words isn't significant.

0 讨论(0)
发布评论:

提交评论
- 加载中...

1 2 下一页