TypeError: expected string or bytes-like object HashingVectorizer

问题

I have been facing this issue while fitting the dataset..Everything seems fine, don't know where the problem is. Since I'm a beginner could anyone please tell me what I am doing wrong or am I missing something?

The problem seems to be in data preprocessing part

Error trace and the dataframe's head has been attached as image below `

train = pd.read_csv('train.txt', sep='\t', dtype=str, header=None)
test =  pd.read_csv('test.txt', sep='\t', dtype=str, header=None)

X_train = train.iloc[:,1:]
y_train = train.iloc[:,0:1]

X_test = test.iloc[:,1:]
y_test = test.iloc[:,0:1]

TOKENS_ALPHANUMERIC = '[A-Za-z0-9]+(?=\\s+)'

steps = [('vectorizer',HashingVectorizer(TOKENS_ALPHANUMERIC,
                                                     norm=None, binary=False, lowercase=False,
                                                     ngram_range=(1,2))),
         ('clf',OneVsRestClassifier(LogisticRegression()))]

pipeline = Pipeline(steps)
pipeline.fit(X_train,y_train)
accuracy = pipeline.score(X_test,y_test)
print(accuracy)

stack trace dataframe head

回答1:

You need to define it like this:

steps = [('vectorizer',HashingVectorizer(tokenizer=TOKENS_ALPHANUMERIC,
                                         norm=None, binary=False, 
                                         lowercase=False,
                                         ngram_range=(1,2))),
         ('clf',OneVsRestClassifier(LogisticRegression()))]

When you do not specifiy the key, the value is taken for the first param in HashingVectorizer which is input and hence it was erroring.

来源：https://stackoverflow.com/questions/50217863/typeerror-expected-string-or-bytes-like-object-hashingvectorizer

标签

scikit-learn

nlp

countvectorizer

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!