tf-idf documents of different length
问题 i have searched the web about normalizing tf grades on cases when the documents' lengths are very different (for example, having the documents lengths vary from 500 words to 2500 words) the only normalizing i've found talk about dividing the term frequency in the length of the document, hence causing the length of the document to not have any meaning. this method though is a really bad one for normalizing tf. if any, it causes the tf grades for each document to have a very large bias (unless