ValueError: Found arrays with inconsistent numbers of samples [ 6 1786]

前端 未结 1 1661
渐次进展
渐次进展 2021-01-20 09:25

Here is my code:

from sklearn.svm import SVC
from sklearn.grid_search import GridSearchCV
from sklearn.cross_validation import KFold
from sklearn.feature_ext         


        
相关标签:
1条回答
  • 2021-01-20 10:04

    I think you've got a bit confused with your X and y here. You want to transform you X into a tf-idf vector and train using this against y. See below

    from sklearn.svm import SVC
    from sklearn.grid_search import GridSearchCV
    from sklearn.cross_validation import KFold
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn import datasets
    import numpy as np
    
    newsgroups = datasets.fetch_20newsgroups(
                    subset='all',
                    categories=['alt.atheism', 'sci.space']
             )
    X = newsgroups.data
    y = newsgroups.target
    
    TD_IF = TfidfVectorizer()
    X_scaled = TD_IF.fit_transform(X, y)
    grid = {'C': np.power(10.0, np.arange(-1, 1))}
    cv = KFold(y_scaled.size, n_folds=5, shuffle=True, random_state=241) 
    clf = SVC(kernel='linear', random_state=241)
    
    gs = GridSearchCV(estimator=clf, param_grid=grid, scoring='accuracy', cv=cv)
    gs.fit(X_scaled, y)
    
    0 讨论(0)
提交回复
热议问题