naivebayes

Naive base classifier of nltk giving unhashable type error

Deadly 提交于 2019-12-12 04:37:09
问题 Following is the code that I wrote using nltk and Python. import nltk import random from nltk.corpus import movie_reviews #from sklearn.naive_bayes import GaussianNB documents = [(list(movie_reviews.words(fileid)), category) for category in movie_reviews.categories() for fileid in movie_reviews.fileids(category)] random.shuffle(documents) #print(documents[1:3]) all_words= [] for w in movie_reviews.words(): all_words.append(w.lower()) all_words = nltk.FreqDist(all_words) #print(all_words.most

In general, when does TF-IDF reduce accuracy?

狂风中的少年 提交于 2019-12-12 03:48:45
问题 I'm training a corpus consisting of 200000 reviews into positive and negative reviews using a Naive Bayes model, and I noticed that performing TF-IDF actually reduced the accuracy (while testing on test set of 50000 reviews) by about 2%. So I was wondering if TF-IDF has any underlying assumptions on the data or model that it works with, i.e. any cases where accuracy is reduced by the use of it? 回答1: The IDF component of TF*IDF can harm your classification accuracy in some cases. Let suppose

naiveBayes and predict function not working in R

廉价感情. 提交于 2019-12-12 03:27:57
问题 I am doing a sentiment analysis on twitter comments (in Kazakh language) using below R script. 3000 (1500sad, 1500happy) comments for the training set and 1000 (happy sad mixed) comments for the test set. Everything works great but at the end, the predicted values are showing all happy, which is not right. I have checked every function and all are working up until the naiveBayes function. I checked classifier values and they are correct. I think either naiveBayes or predict is messing things

Look up BernoulliNB Probability in Dataframe

蹲街弑〆低调 提交于 2019-12-11 17:22:09
问题 I have some training data (TRAIN) and some test data (TEST). Each row of each dataframe contains an observed class (X) and some columns of binary (Y). BernoulliNB predicts the probability of X given Y in the test data based on the training data. I am trying to look up the probability of the observed class of each row in the test data (Pr). Edit: I used Antoine Zambelli's advice to fix the code: from sklearn.naive_bayes import BernoulliNB BNB = BernoulliNB() # Training Data TRAIN = pd

Print out prediction with WEKA in Java

时光怂恿深爱的人放手 提交于 2019-12-11 16:16:39
问题 I am trying to make a prediction with Weka in Java, using the Naive Bayes Classifier, with the following code: JAVA public class Run { public static void main(String[] args) throws Exception { ConverterUtils.DataSource source1 = new ConverterUtils.DataSource("./data/train.arff"); Instances train = source1.getDataSet(); // setting class attribute if the data format does not provide this information // For example, the XRFF format saves the class attribute information as well if (train

No module named NaiveBayes

痴心易碎 提交于 2019-12-11 14:17:31
问题 The code we are implementing is from NaiveBayes import Pool import os DClasses = ["python", "java", "hadoop", "django", "datascience", "php"] base = "learn/" p = Pool() for i in DClasses: p.learn(base + i, i) base = "test/" for i in DClasses: dir = os.listdir(base + i) for file in dir: res = p.Probability(base + i + "/" + file) print(i + ": " + file + ": " + str(res)) but we are getting error like no module found like naivebayes. ---------------------------------------------------------------

ValueError: too many values to unpack (NLTK classifier)

荒凉一梦 提交于 2019-12-11 14:15:59
问题 I'm doing classification analysis using NLTK's Naive Bayes classifier. I insert a tsv file containing records and labels. But the file doesn't get trained due to an error. Here's my python code import numpy as np import matplotlib.pyplot as plt import pandas as pd dataset = pd.read_csv('tweets.txt', delimiter ='\t', quoting = 3) dataset.isnull().any() dataset = dataset.fillna(method='ffill') import re import nltk from nltk.corpus import stopwords from nltk.stem.porter import PorterStemmer

R caret naïve bayes accuracy is null

岁酱吖の 提交于 2019-12-11 13:28:56
问题 I have one dataset to train with SVM and Naïve Bayes. SVM works, but Naïve Bayes doesn't work. Follow de source code below: library(tools) library(caret) library(doMC) library(mlbench) library(magrittr) library(caret) CORES <- 5 #Optional registerDoMC(CORES) #Optional load("chat/rdas/2gram-entidades-erro.Rda") set.seed(10) split=0.60 maFinal$resposta <- as.factor(maFinal$resposta) data_train <- as.data.frame(unclass(maFinal[ trainIndex,])) data_test <- maFinal[-trainIndex,] treegram25NotNull

Text classification NaiveBayes

↘锁芯ラ 提交于 2019-12-11 09:46:24
问题 I am trying to classify a series of text example News by category. I have huge dataset of news text with category in database. Machine should be trained and decide the news category. public static string[] Tokenize(string text) { StringBuilder sb = new StringBuilder(text); char[] invalid = "!-;':'\",.?\n\r\t".ToCharArray(); for (int i = 0; i < invalid.Length; i++) sb.Replace(invalid[i], ' '); return sb.ToString().Split(new[] { ' ' }, System.StringSplitOptions.RemoveEmptyEntries); } private

Unknown label type error when Sklearn naive bayes used with floating point numbers

浪子不回头ぞ 提交于 2019-12-11 03:36:26
问题 I am applying Naive Bayes algorithm on my data which is labelled by floating point numbers. If my Y array consists of int type value then the prediction is coming correctly. See the below code: import numpy as np X = np.array([[0], [1]]) Y = np.array([1, 2]) from sklearn.naive_bayes import GaussianNB clf = GaussianNB() clf.fit(X, Y) print (clf.predict([[0]])) Output is [1] String values are also working. See the below code: import numpy as np X = np.array([[0], [1]]) Y = np.array(['A', 'B'])