Language Modal through whoosh in Information Retrieval

问题

I am working in IR.

Can any one guide me, how can I implement the language modal in whoosh. I already Applied TD-IDF and BM25. I am new to IR.

For an example, the simplest form of language model simply throws away all conditioning context, and estimates each term independently. Such a model is called a unigram language model:

P_{uni}(t_1t_2t_3t_4) = P(t_1)P(t_2)P(t_3)P(t_4)

There are many more complex kinds of language models, such as bigram language models, which condition on the previous term,

P_{bi}(t_1t_2t_3t_4) = P(t_1)P(t_2\vert t_1)P(t_3\vert t_2)P(t_4\vert t_3)

回答1:

Take a look at Whoosh's scoring module and use BM25F (lines 276 to 332) as a reference for building your own weighting and scoring models. You need to create a Weighting Model and a Scorer. Assuming you want to call your model Unigram, the main steps would be:

Implement your own Unigram weighting model class and inherit from scoring.WeightingModel:

class Unigram(WeightingModel)

Implement the methods required by the base class, the main one being scorer(), which returns a reference to your Scorer class (next). This class is called when you create your searcher and defines the Weighting Model the searcher will use.
Implement a UnigramScorer class and inherit from scoring.WeightLengthScorer:

class UnigramScorer(WeightLengthScorer)

Implement the __init__ and _score methods. __init__ takes the field name and value and is called once for each term in your query when you call searcher.search(). _score is called for each matching document in your results. It takes a weight and length and returns a score for a given field.
When you create your searcher at search time, specify your custom language model using the weighting parameter:

ix.searcher(weighting = Unigram)

来源：https://stackoverflow.com/questions/47944961/language-modal-through-whoosh-in-information-retrieval

标签

python

information-retrieval

whoosh