How to include words as numerical feature in classification

前端 未结 3 747
眼角桃花
眼角桃花 2021-02-06 11:40

Whats the best method to use the words itself as the features in any machine learning algorithm ?

The problem I have to extract word related feature from a particular p

3条回答
  •  别那么骄傲
    2021-02-06 12:41

    Standard approach is the "bag-of-words" representation where you have one feature per word, giving "1" if the word occurs in the document and "0" if it doesn't occur.

    This gives lots of features, but if you have a simple learner like Naive Bayes, that's still OK.

    "Index in the dictionary" is a useless feature, I wouldn't use it.

提交回复
热议问题