naivebayes | 易学教程

Naive base classifier of nltk giving unhashable type error

阅读更多关于 Naive base classifier of nltk giving unhashable type error

问题 Following is the code that I wrote using nltk and Python. import nltk import random from nltk.corpus import movie_reviews #from sklearn.naive_bayes import GaussianNB documents = [(list(movie_reviews.words(fileid)), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)] random.shuffle(documents) #print(documents[1:3]) all_words= [] for w in movie_reviews.words(): all_words.append(w.lower()) all_words = nltk.FreqDist(all_words) #print(all_words.most

In general, when does TF-IDF reduce accuracy?

阅读更多关于 In general, when does TF-IDF reduce accuracy?

问题 I'm training a corpus consisting of 200000 reviews into positive and negative reviews using a Naive Bayes model, and I noticed that performing TF-IDF actually reduced the accuracy (while testing on test set of 50000 reviews) by about 2%. So I was wondering if TF-IDF has any underlying assumptions on the data or model that it works with, i.e. any cases where accuracy is reduced by the use of it? 回答1: The IDF component of TF*IDF can harm your classification accuracy in some cases. Let suppose

naiveBayes and predict function not working in R

阅读更多关于 naiveBayes and predict function not working in R

问题 I am doing a sentiment analysis on twitter comments (in Kazakh language) using below R script. 3000 (1500sad, 1500happy) comments for the training set and 1000 (happy sad mixed) comments for the test set. Everything works great but at the end, the predicted values are showing all happy, which is not right. I have checked every function and all are working up until the naiveBayes function. I checked classifier values and they are correct. I think either naiveBayes or predict is messing things

Look up BernoulliNB Probability in Dataframe

阅读更多关于 Look up BernoulliNB Probability in Dataframe

问题 I have some training data (TRAIN) and some test data (TEST). Each row of each dataframe contains an observed class (X) and some columns of binary (Y). BernoulliNB predicts the probability of X given Y in the test data based on the training data. I am trying to look up the probability of the observed class of each row in the test data (Pr). Edit: I used Antoine Zambelli's advice to fix the code: from sklearn.naive_bayes import BernoulliNB BNB = BernoulliNB() # Training Data TRAIN = pd

Print out prediction with WEKA in Java

阅读更多关于 Print out prediction with WEKA in Java

问题 I am trying to make a prediction with Weka in Java, using the Naive Bayes Classifier, with the following code: JAVA public class Run { public static void main(String[] args) throws Exception { ConverterUtils.DataSource source1 = new ConverterUtils.DataSource("./data/train.arff"); Instances train = source1.getDataSet(); // setting class attribute if the data format does not provide this information // For example, the XRFF format saves the class attribute information as well if (train

No module named NaiveBayes

阅读更多关于 No module named NaiveBayes

问题 The code we are implementing is from NaiveBayes import Pool import os DClasses = ["python", "java", "hadoop", "django", "datascience", "php"] base = "learn/" p = Pool() for i in DClasses: p.learn(base + i, i) base = "test/" for i in DClasses: dir = os.listdir(base + i) for file in dir: res = p.Probability(base + i + "/" + file) print(i + ": " + file + ": " + str(res)) but we are getting error like no module found like naivebayes. ---------------------------------------------------------------

ValueError: too many values to unpack (NLTK classifier)

阅读更多关于 ValueError: too many values to unpack (NLTK classifier)

问题 I'm doing classification analysis using NLTK's Naive Bayes classifier. I insert a tsv file containing records and labels. But the file doesn't get trained due to an error. Here's my python code import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv('tweets.txt', delimiter ='\t', quoting = 3) dataset.isnull().any() dataset = dataset.fillna(method='ffill') import re import nltk from nltk.corpus import stopwords from nltk.stem.porter import PorterStemmer

R caret naïve bayes accuracy is null

阅读更多关于 R caret naïve bayes accuracy is null

问题 I have one dataset to train with SVM and Naïve Bayes. SVM works, but Naïve Bayes doesn't work. Follow de source code below: library(tools) library(caret) library(doMC) library(mlbench) library(magrittr) library(caret) CORES <- 5 #Optional registerDoMC(CORES) #Optional load("chat/rdas/2gram-entidades-erro.Rda") set.seed(10) split=0.60 maFinal$resposta <- as.factor(maFinal$resposta) data_train <- as.data.frame(unclass(maFinal[ trainIndex,])) data_test <- maFinal[-trainIndex,] treegram25NotNull

Text classification NaiveBayes

阅读更多关于 Text classification NaiveBayes

问题 I am trying to classify a series of text example News by category. I have huge dataset of news text with category in database. Machine should be trained and decide the news category. public static string[] Tokenize(string text) { StringBuilder sb = new StringBuilder(text); char[] invalid = "!-;':'\",.?\n\r\t".ToCharArray(); for (int i = 0; i < invalid.Length; i++) sb.Replace(invalid[i], ' '); return sb.ToString().Split(new[] { ' ' }, System.StringSplitOptions.RemoveEmptyEntries); } private

Unknown label type error when Sklearn naive bayes used with floating point numbers

阅读更多关于 Unknown label type error when Sklearn naive bayes used with floating point numbers

问题 I am applying Naive Bayes algorithm on my data which is labelled by floating point numbers. If my Y array consists of int type value then the prediction is coming correctly. See the below code: import numpy as np X = np.array([[0], [1]]) Y = np.array([1, 2]) from sklearn.naive_bayes import GaussianNB clf = GaussianNB() clf.fit(X, Y) print (clf.predict([[0]])) Output is [1] String values are also working. See the below code: import numpy as np X = np.array([[0], [1]]) Y = np.array(['A', 'B'])