问题
I am working in IR.
Can any one guide me, how can I implement the language modal in whoosh. I already Applied TD-IDF and BM25. I am new to IR.
For an example, the simplest form of language model simply throws away all conditioning context, and estimates each term independently. Such a model is called a unigram language model:
P_{uni}(t_1t_2t_3t_4) = P(t_1)P(t_2)P(t_3)P(t_4)
There are many more complex kinds of language models, such as bigram language models, which condition on the previous term,
P_{bi}(t_1t_2t_3t_4) = P(t_1)P(t_2\vert t_1)P(t_3\vert t_2)P(t_4\vert t_3)
回答1:
Take a look at Whoosh's scoring module and use BM25F (lines 276 to 332) as a reference for building your own weighting and scoring models. You need to create a Weighting Model and a Scorer. Assuming you want to call your model Unigram
, the main steps would be:
Implement your own
Unigram
weighting model class and inherit fromscoring.WeightingModel
:class Unigram(WeightingModel)
Implement the methods required by the base class, the main one being
scorer()
, which returns a reference to yourScorer
class (next). This class is called when you create yoursearcher
and defines the Weighting Model the searcher will use.Implement a
UnigramScorer
class and inherit fromscoring.WeightLengthScorer
:class UnigramScorer(WeightLengthScorer)
Implement the
__init__
and_score
methods.__init__
takes the field name and value and is called once for each term in your query when you callsearcher.search()
._score
is called for each matching document in your results. It takes aweight
andlength
and returns a score for a given field.When you create your searcher at search time, specify your custom language model using the
weighting
parameter:ix.searcher(weighting = Unigram)
来源:https://stackoverflow.com/questions/47944961/language-modal-through-whoosh-in-information-retrieval