is it possible Apply PCA on any Text Classification?

后端 未结 3 1293
南旧
南旧 2021-02-05 15:14

I\'m trying a classification with python. I\'m using Naive Bayes MultinomialNB classifier for the web pages (Retrieving data form web to text , later I classify this text: web c

3条回答
  •  庸人自扰
    2021-02-05 15:43

    Rather than converting a sparse matrix to dense (which is discouraged), I would use scikits-learn's TruncatedSVD, which is a PCA-like dimmensionality reduction algorithm (using by default Randomized SVD) which works on sparse data:

    svd = TruncatedSVD(n_components=5, random_state=42)
    data = svd.fit_transform(data) 
    

    And, citing from the TruncatedSVD documentation:

    In particular, truncated SVD works on term count/tf-idf matrices as returned by the vectorizers in sklearn.feature_extraction.text. In that context, it is known as latent semantic analysis (LSA).

    which is exactly your use case.

提交回复
热议问题