TF-IDF Vectors can be generated at different levels of input tokens (words, characters, n-grams), which accuracy should be considered?

问题

Here you can see i am calculating the frequency at Count Vectors,WordLevel, N-Gram Vectors

accuracy = train_model( classifier, xtrain_count, train_y, xvalid_count)
print("NB, Count Vectors: ", accuracy)

# Naive Bayes on Word Level TF IDF Vectors
 accuracy = train_model(classifier, xtrain_tfidf, train_y, xvalid_tfidf)
print("NB, WordLevel TF-IDF: ", accuracy)

# Naive Bayes on Ngram Level TF IDF Vectors
accuracy = train_model(classifier, xtrain_tfidf_ngram, train_y, xvalid_tfidf_ngram)
print("NB, N-Gram Vectors: ", accuracy)

# Naive Bayes on Character Level TF IDF Vectors
 accuracy = train_model(classifier, xtrain_tfidf_ngram_chars, train_y, xvalid_tfidf_ngram_chars)
 print("NB, CharLevel Vectors: ", accuracy)

how i will interpret the result because i am receiving different results at each level? can anyone tell how i can get the collectives results?

回答1:

You can stack the features together before training your classifiers.

from scipy.sparse import hstack
train_features = hstack([train_char_features, train_word_features])
test_features = hstack([test_char_features, test_word_features])

来源：https://stackoverflow.com/questions/62926155/tf-idf-vectors-can-be-generated-at-different-levels-of-input-tokens-words-char

标签

machine-learning

text-classification

tf-idf

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!