ValueError: too many values to unpack (NLTK classifier)

荒凉一梦 提交于 2019-12-11 14:15:59

问题


I'm doing classification analysis using NLTK's Naive Bayes classifier. I insert a tsv file containing records and labels.

But the file doesn't get trained due to an error. Here's my python code

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

dataset = pd.read_csv('tweets.txt', delimiter ='\t', quoting = 3)

dataset.isnull().any()

dataset = dataset.fillna(method='ffill')

import re
import nltk
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
corpus = []
for i in range(0,16004):
    tweet = re.sub('[^a-zA-Z]', ' ', dataset['tweet'][i])
    tweet = tweet.lower()
    tweet = tweet.split()
    ps = PorterStemmer()
    tweet = [ps.stem(word) for word in tweet if not word in 
    set(stopwords.words('english'))]
    tweet = ' '.join(tweet)
    corpus.append(tweet)

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 10000)
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, 1].values




from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, 
random_state = 0)
train_set, test_set = X_train[500:], y_train[:500]

classifier = nltk.NaiveBayesClassifier.train(train_set)

The error is:

File "C:\Users\HSR\Anaconda2\lib\site-packages\nltk\classify\naivebayes.py", line 194, in train
for featureset, label in labeled_featuresets:

ValueError: too many values to unpack

回答1:


NLTKClassifier doesn't work like scikit estimators. It requires the X and y both in a single array which is then passed to train().

But in your code, you are only supplying it the X_train and it tries to unpack y from that and hence the error.

The NaiveBayesClassifier requires the input to be a list of tuples where list denotes the training samples and the tuple has the feature dictionary and label inside. Something like:

X = [({feature1:'val11', feature2:'val12' .... }, class1),
     ({feature1:'val21', feature2:'val22' .... }, class2), 
     ...
     ...                                                  ]

You need to change your input to this format.

feature_names = cv.get_feature_names()
train_set = []
for i, single_sample in enumerate(X):
    single_feature_dict = {}
    for j, single_feature in enumerate(single_sample):
        single_feature_dict[feature_names[j]]=single_feature
    train_set.append((single_feature_dict, y[i]))    

Note: The above for loop can be shortened by using dict comprehension but I'm not that fluent there.

Then you can do this:

nltk.NaiveBayesClassifier.train(train_set)


来源:https://stackoverflow.com/questions/49097979/valueerror-too-many-values-to-unpack-nltk-classifier

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!