Does stemming harm precision in text classification?

泄露秘密 提交于 2019-12-06 07:32:53

问题


I have read stemming harms precision but improves recall in text classification. How does that happen? When you stem you increase the number of matches between the query and the sample documents right?


回答1:


It's always the same, if you raise recall, your doing a generalisation. Because of that, you're losing precision. Stemming merge words together.

On the one hand, words which ought to be merged together (such as "adhere" and "adhesion") may remain distinct after stemming; on the other, words which are really distinct may be wrongly conflated (e.g., "experiment" and "experience"). These are known as understemming errors and overstemming errors respectively.

Overstemming lowers precision and understemming lowers recall. So, since no stemming at all means no over- but max understemming errors, you have a low recall there and a high precision.

Btw, precision means how many of your found 'documents' are those you were looking for. Recall means how many of all 'documents', which were correct, you received.




回答2:


From the wikipedia entry on Query_expansion:

By stemming a user-entered term, more documents are matched, as the alternate word forms for a user entered term are matched as well, increasing the total recall. This comes at the expense of reducing the precision. By expanding a search query to search for the synonyms of a user entered term, the recall is also increased at the expense of precision. This is due to the nature of the equation of how precision is calculated, in that a larger recall implicitly causes a decrease in precision, given that factors of recall are part of the denominator. It is also inferred that a larger recall negatively impacts overall search result quality, given that many users do not want more results to comb through, regardless of the precision.



来源:https://stackoverflow.com/questions/10369479/does-stemming-harm-precision-in-text-classification

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!