naivebayes

naive classifier matlab

好久不见. 提交于 2019-12-05 19:52:07
When testing the naive classifier in matlab I get different results even though I trained and tested on the same sample data, I was wondering if my code is correct and if someone could help explain why this is? %% dimensionality reduction columns = 6 [U,S,V]=svds(fulldata,columns); %% randomly select dataset rows = 1000; columns = 6; %# pick random rows indX = randperm( size(fulldata,1) ); indX = indX(1:rows)'; %# pick random columns %indY = randperm( size(fulldata,2) ); indY = indY(1:columns); %# filter data data = U(indX,indY); %% apply normalization method to every cell data = zscore(data);

Simple text classification using naive bayes (weka) in java

穿精又带淫゛_ 提交于 2019-12-05 07:32:22
I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input. this is my training data: @relation hamspam @attribute text string @attribute class {spam,ham} @data 'good',ham 'good',ham 'very good',ham 'bad',spam 'very bad',spam 'very bad, very bad',spam 'good good bad',ham this is my testing_data: @relation test @attribute text string @attribute class {spam,ham} @data 'good bad very bad',? 'good bad very bad',? 'good',? 'good very good',? 'bad',? 'very good'

Warnings while using the Naive Bayes Classifier in the Caret Package

女生的网名这么多〃 提交于 2019-12-04 22:02:16
I am attempting to run a supervised machine learning classifier known as Naive Bayes in the caret Package. My data is called LDA.scores, and has two categorical factors called "V4" and "G8", and 12 predictor variables. The code that I am using was adapted by a kind person on stack overflow from code supplied by myself (see link below).The code does work, however, only 9 predictors were used instead of the 12 predictors in the data-set. When I tried to train the Naive Bayes model with the total data set [2:13], the code failed. My next step was to systematically run the code with a subset of

Naive-bayes multinomial text classifier using Data frame in Scala Spark

让人想犯罪 __ 提交于 2019-12-04 12:29:55
I am trying to build a NaiveBayes classifier, loading the data from database as DataFrame which contains (label, text). Here's the sample of data (multinomial label): label| feature| +-----+--------------------+ | 1|combusting prepar...| | 1|adhesives for ind...| | 1| | | 1| salt for preserving| | 1|auxiliary fluids ...| I have used following transformation for tokenization, stopword, n-gram, and hashTF : val selectedData = df.select("label", "feature") // Tokenize RDD val tokenizer = new Tokenizer().setInputCol("feature").setOutputCol("words") val regexTokenizer = new RegexTokenizer()

Naive Bayes: the within-class variance in each feature of TRAINING must be positive

柔情痞子 提交于 2019-12-04 04:03:24
When trying to fit Naive Bayes: training_data = sample; % target_class = K8; # train model nb = NaiveBayes.fit(training_data, target_class); # prediction y = nb.predict(cluster3); I get an error: ??? Error using ==> NaiveBayes.fit>gaussianFit at 535 The within-class variance in each feature of TRAINING must be positive. The within-class variance in feature 2 5 6 in class normal. are not positive. Error in ==> NaiveBayes.fit at 498 obj = gaussianFit(obj, training, gindex); Can anyone shed light on this and how to solve it? Note that I have read a similar post here but I am not sure what to do?

is it possible Apply PCA on any Text Classification?

早过忘川 提交于 2019-12-03 11:44:30
I'm trying a classification with python. I'm using Naive Bayes MultinomialNB classifier for the web pages (Retrieving data form web to text , later I classify this text: web classification). Now, I'm trying to apply PCA on this data, but python is giving some errors. My code for classification with Naive Bayes : from sklearn import PCA from sklearn import RandomizedPCA from sklearn.feature_extraction.text import CountVectorizer from sklearn.naive_bayes import MultinomialNB vectorizer = CountVectorizer() classifer = MultinomialNB(alpha=.01) x_train = vectorizer.fit_transform(temizdata)

Object Oriented Bayesian Spam Filtering?

余生颓废 提交于 2019-12-03 05:38:26
问题 I was wondering if there is any good and clean object-oriented programming (OOP) implementation of Bayesian filtering for spam and text classification? This is just for learning purposes. 回答1: I definitely recommend Weka which is an Open Source Data Mining Software written in Java: Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing,

How can I use sklearn.naive_bayes with (multiple) categorical features?

限于喜欢 提交于 2019-12-03 05:00:06
问题 I want to learn a Naive Bayes model for a problem where the class is boolean (takes on one of two values). Some of the features are boolean, but other features are categorical and can take on a small number of values (~5). If all my features were boolean then I would want to use sklearn.naive_bayes.BernoulliNB . It seems clear that sklearn.naive_bayes.MultinomialNB is not what I want. One solution is to split up my categorical features into boolean features. For instance, if a variable "X"

How can I use sklearn.naive_bayes with (multiple) categorical features?

走远了吗. 提交于 2019-12-02 20:31:11
I want to learn a Naive Bayes model for a problem where the class is boolean (takes on one of two values). Some of the features are boolean, but other features are categorical and can take on a small number of values (~5). If all my features were boolean then I would want to use sklearn.naive_bayes.BernoulliNB . It seems clear that sklearn.naive_bayes.MultinomialNB is not what I want. One solution is to split up my categorical features into boolean features. For instance, if a variable "X" takes on values "red", "green", "blue", I can have three variables: "X is red", "X is green", "X is blue"

Why did NLTK NaiveBayes classifier misclassify one record?

孤街浪徒 提交于 2019-12-02 13:32:11
问题 This is the first time I am building a sentiment analysis machine learning model using the nltk NaiveBayesClassifier in Python. I know it is too simple of a model, but it is just a first step for me and I will try tokenized sentences next time. The real issue I have with my current model is: I have clearly labeled the word 'bad' as negative in the training data set (as you can see from the 'negative_vocab' variable). However, when I ran the NaiveBayesClassifier on each sentence (lower case)