I am checking using the below function what are the most frequent words per category and then observe how some sentences would be classified. The results are surprisingly wrong
The order of names in cat
variable and newsgroup_train.target_names
is different. The labels assigned target_names
are sorted, see here
Output of:
print(cat)
['sci.space','rec.autos','rec.motorcycles']
print(newsgroups_train.target_names)
['rec.autos', 'rec.motorcycles', 'sci.space']
You should this line:
print(" - Predicted as: '{}'".format(cats[predicted]))
to
print(" - Predicted as: '{}'".format(newsgroup_train.target_names[predicted]))