is it possible Apply PCA on any Text Classification?

后端未结

关注

 3  1293

南旧 2021-02-05 15:14

I\'m trying a classification with python. I\'m using Naive Bayes MultinomialNB classifier for the web pages (Retrieving data form web to text , later I classify this text: web c

3条回答

庸人自扰 (楼主)

2021-02-05 15:43
Rather than converting a sparse matrix to dense (which is discouraged), I would use scikits-learn's TruncatedSVD, which is a PCA-like dimmensionality reduction algorithm (using by default Randomized SVD) which works on sparse data:
```
svd = TruncatedSVD(n_components=5, random_state=42)
data = svd.fit_transform(data) 
```
And, citing from the TruncatedSVD documentation:

In particular, truncated SVD works on term count/tf-idf matrices as returned by the vectorizers in sklearn.feature_extraction.text. In that context, it is known as latent semantic analysis (LSA).

which is exactly your use case.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...