Text Summarization Evaluation - BLEU vs ROUGE

前端未结

关注

 3  1758

傲寒 2021-01-31 15:54

With the results of two different summary systems (sys1 and sys2) and the same reference summaries, I evaluated them with both BLEU and ROUGE. The problem is: All ROUGE scores o

3条回答

北荒 (楼主)

2021-01-31 16:19
Both ROUGE and BLEU are based on n-gram to measure the similar between the summaries of systems and the summaries of human. So why there are differences in results of evaluation like that? And what's the main different of ROUGE vs BLEU to explain this issue?

There exist both the ROUGE-n precision and the ROUGE-n precision recall. the original ROUGE implementation from the paper that introduced ROUGE {3} computes both, as well as the resulting F1-score.

From http://text-analytics101.rxnlp.com/2017/01/how-rouge-works-for-evaluation-of.html (mirror):

ROUGE recall:

ROUGE precision:

(The original ROUGE implementation from the paper that introduced ROUGE {1} may perform a few more things such as stemming.)

The ROUGE-n precision and recall are easy to interpret, unlike BLEU (see Interpreting ROUGE scores).

The difference between the ROUGE-n precision and BLEU is that BLEU introduces a brevity penalty term, and also compute the n-gram match for several size of n-grams (unlike the ROUGE-n, where there is only one chosen n-gram size). Stack Overflow does not support LaTeX so I won't go into more formulas to compare against BLEU. {2} explains BLEU clearly.

References:
- {1} Lin, Chin-Yew. "Rouge: A package for automatic evaluation of summaries." In Text summarization branches out: Proceedings of the ACL-04 workshop, vol. 8. 2004. https://scholar.google.com/scholar?cluster=2397172516759442154&hl=en&as_sdt=0,5 ; http://anthology.aclweb.org/W/W04/W04-1013.pdf
- {2} Callison-Burch, Chris, Miles Osborne, and Philipp Koehn. "Re-evaluation the Role of Bleu in Machine Translation Research." In EACL, vol. 6, pp. 249-256. 2006. https://scholar.google.com/scholar?cluster=8900239586727494087&hl=en&as_sdt=0,5 ;
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...