I have question about how to evaluate the information retrieve result is good or not such as calculate
the relevant document rank, recall, precision ,AP, MAP.....
currently, the system is able to retrieve the document from the database once the users enter the query. The problem is I do not know how to do the evaluation.
I got some public data set such as "Cranfield collection" dataset link it contains
1.document 2.query 3.relevance assesments
DOCS QRYS SIZE*
Cranfield 1,400 225 1.6
May I know how to use do the evaluation by using "Cranfield collection" to calculate the relevant document rank, recall, precision ,AP, MAP.....
I might need some ideas and direction. not asking for how to code the program.
Document Ranking
Okapi BM25 (BM stands for Best Matching) is a ranking function used by search engines to rank matching documents according to their relevance to a given search query. It is based on the probabilistic retrieval framework. BM25 is a bag-of-words retrieval function that ranks a set of documents based on the query terms appearing in each document, regardless of the inter-relationship between the query terms within a document (e.g., their relative proximity). See the Wikipedia page for more details.
Precision and Recall
Precision measures "of all the documents we retrieved as relevant how many are actually relevant?".
Precision = No. of relevant documents retrieved / No. of total documents retrieved
Recall measures "Of all the actual relevant documents how many did we retrieve as relevant?".
Recall = No. of relevant documents retrieved / No. of total relevant documents
Suppose, when a query "q" is submitted to an information retrieval system (ex., search engine) having 100 relevant documents w.r.t. the query "q", the system retrieves 68 documents out of total collection of 600 documents. Out of 68 retrieved documents, 40 documents were relevant. So, in this case:
Precision = 40 / 68 = 58.8%
and Recall = 40 / 100 = 40%
F-Score / F-measure is the weighted harmonic mean of precision and recall. The traditional F-measure or balanced F-score is:
F-Score = 2 * Precision * Recall / Precision + Recall
Average Precision
You can think of it this way: you type something in Google
and it shows you 10 results. It’s probably best if all of them were relevant. If only some are relevant, say five of them, then it’s much better if the relevant ones are shown first. It would be bad if first five were irrelevant and good ones only started from sixth, wouldn’t it? AP score reflects this.
Giving an example below:
AvgPrec of the two rankings:
Ranking#1: (1.0 + 0.67 + 0.75 + 0.8 + 0.83 + 0.6) / 6 = 0.78
Ranking#2: (0.5 + 0.4 + 0.5 + 0.57 + 0.56 + 0.6) / 6 = 0.52
Mean Average Precision (MAP)
MAP is mean of average precision across multiple queries/rankings. Giving an example for illustration.
Mean average precision for the two queries:
For query 1, AvgPrec: (1.0+0.67+0.5+0.44+0.5) / 5 = 0.62
For query 2, AvgPrec: (0.5+0.4+0.43) / 3 = 0.44
So, MAP = (0.62 + 0.44) / 2 = 0.53
Sometimes, people use precision@k
, recall@k
as performance measure of a retrieval system. You should build a retrieval system for such testings. If you want to write your program in Java, you should consider Apache Lucene to build your index.
calculating precision and recall is simple; Precision is the fraction of relevant retrieved documents to all the documents that you retrieved. Recall is the fraction of relevant documents retrieved to all relevant documents.
For example if a query has 20 relevant documents, and you retrieved 25 documents that only 14 of them is relevant to the query, then : Precision = 14/25 and Recall = 14/20.
But precision and recall should be combined in a way, that way is called F-Measure and is harmonic mean of precision and recall: F-Score = 2*Precision*Recall/Precision+Recall .
AP tells you the proportion of relevant documents to irrelevant documents in a specific number of retrieved documents. Assume you retrieved 25 documents and in the first 10 documents, 8 relevant documents are retrieved. So AP(10) = 8/10;
If you calculate and add AP for 1 to N, then divide it by N, you just calculated MAP. Where N is the total number of relevant documents in yoyr data set.
来源:https://stackoverflow.com/questions/40801196/some-ideas-and-direction-of-how-to-measure-ranking-ap-map-recall-for-ir-evalu