问题
I've found how to sort query results by a given field in a Lucene.Net index instead of by score; all it takes is a field that is indexed but not tokenized. However, what I haven't been able to figure out is how to sort that field while ignoring stop words such as "a" and "the", so that the following book titles, for example, would sort in ascending order like so:
- The Cat in the Hat
- Horton Hears a Who
Is such a thing possible, and if yes, how?
I'm using Lucene.Net 2.3.1.2.
回答1:
I wrap the results returned by Lucene into my own collection of custom objects. Then I can populate it with extra info/context information (and use things like the highlighter class to pull out a snippet of the matches), plus add paging. If you took a similar route you could create a "result" class/object, add something like a SortBy property and grab whatever field you wanted to sort by, strip out any stop words, then save it in this property. Now just sort the collection based on that property instead.
回答2:
When you create your index, create a field that only contains the words you wish to sort on, then when retrieving, sort on that field but display the full title.
回答3:
It's been a while since I used Lucene but my guess would be to add an extra field for sorting and storing the value in there with the stop words already stripped. You can probably use the same analyzers to generate this value.
回答4:
There seems to be a catch-22 in that you must tokenize a field with an analyzer in order to strip punctuation and stop words, but you can't sort on tokenized fields. How then to strip the stop words without tokenizing?
回答5:
For search, I found search lucene .net index with sort option link interesting to solve ur problem
来源:https://stackoverflow.com/questions/66041/how-to-sort-by-lucene-net-field-and-ignore-common-stop-words-such-as-a-and-th