Language Modal through whoosh in Information Retrieval

浪尽此生 提交于 2019-12-08 08:06:00

问题


I am working in IR.

Can any one guide me, how can I implement the language modal in whoosh. I already Applied TD-IDF and BM25. I am new to IR.

For an example, the simplest form of language model simply throws away all conditioning context, and estimates each term independently. Such a model is called a unigram language model:

P_{uni}(t_1t_2t_3t_4) = P(t_1)P(t_2)P(t_3)P(t_4)

There are many more complex kinds of language models, such as bigram language models, which condition on the previous term,

P_{bi}(t_1t_2t_3t_4) = P(t_1)P(t_2\vert t_1)P(t_3\vert t_2)P(t_4\vert t_3)

回答1:


Take a look at Whoosh's scoring module and use BM25F (lines 276 to 332) as a reference for building your own weighting and scoring models. You need to create a Weighting Model and a Scorer. Assuming you want to call your model Unigram, the main steps would be:

  1. Implement your own Unigram weighting model class and inherit from scoring.WeightingModel:

    class Unigram(WeightingModel)

    Implement the methods required by the base class, the main one being scorer(), which returns a reference to your Scorer class (next). This class is called when you create your searcher and defines the Weighting Model the searcher will use.

  2. Implement a UnigramScorer class and inherit from scoring.WeightLengthScorer:

    class UnigramScorer(WeightLengthScorer)

    Implement the __init__ and _score methods. __init__ takes the field name and value and is called once for each term in your query when you call searcher.search(). _score is called for each matching document in your results. It takes a weight and length and returns a score for a given field.

  3. When you create your searcher at search time, specify your custom language model using the weighting parameter:

    ix.searcher(weighting = Unigram)



来源:https://stackoverflow.com/questions/47944961/language-modal-through-whoosh-in-information-retrieval

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!