naive classifier matlab

余生长醉 提交于 2019-12-10 10:11:32

问题


When testing the naive classifier in matlab I get different results even though I trained and tested on the same sample data, I was wondering if my code is correct and if someone could help explain why this is?

%% dimensionality reduction 
columns = 6
[U,S,V]=svds(fulldata,columns);

%% randomly select dataset
rows = 1000;
columns = 6;

%# pick random rows
indX = randperm( size(fulldata,1) );
indX = indX(1:rows)';

%# pick random columns
%indY = randperm( size(fulldata,2) );
indY = indY(1:columns);

%# filter data
data = U(indX,indY);

%% apply normalization method to every cell
data = zscore(data);

%create a training set the same as datasample
training_data = data;

%match the class labels to the corresponding rows
target_class = classlabels(indX,:)

%classify the same data sample to check if naive bayes works
class  = classify(data, training_data, target_class, 'diaglinear')
confusionmat(test_class, class)

Here is an example:

Notice it got ipsweep, teardrop and back mixed up with normal traffic. I haven't gotten to the stage of classifying unseen data yet I just wanted to test if it would classify the same data.

The confusion matrix output:

ans =

   537     0     0     0     0     0     0     1     0
     0   224     0     0     0     1     0     1     0
     0     0    91    79     0    17    24     4     0
     0     0     0     8     0     0     2     0     0
     0     0     0     0     3     0     0     0     0
     0     0     0     0     0     1     0     0     0
     0     0     0     0     0     0     2     0     0
     0     0     0     0     0     0     0     3     0
     0     0     0     0     0     1     0     0     1

Although I have no clue what this actually is and I probably got this wrong in my code but I thought I would just test to see what it outputs.


回答1:


You are using a classifier on data of reduced dimensionality. A classifier is meant to be slightly imprecise because it needs to generalize. In the dimensionality reduction stage you are loosing information which also leads to reduced classification performance.

Don't expect perfect performance even on the training set, this would be a bad case of over-fitting.

As for the use of the confusion matrix. C(3,4)=79 means nothing more than that for 79 data points the class should be 3 and they got classified as class 4. The complete matrix says that your classifier works well for classes 1 and 2 but has problems with class 3. The rest of the classes have almost no data so it is difficult to judge how good the classifier works for them.



来源:https://stackoverflow.com/questions/11554938/naive-classifier-matlab

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!