weka

Trained and Test data have different number of attributes that gave an error “Train and test set are not compatible”

≯℡__Kan透↙ 提交于 2019-12-25 03:02:46
问题 I use WEKA for Text classification , I have trained data set , and I apply StringToWOrdVector and NumericToNominal filters , and have test data set and applied the same filters on it . When I try to apply my model on test data ,it gave me the following error Train and test set are not compatible I searched for a solution , the error occurred because number of attributes different between two sets, and it always be different because texts in two sets are different How I can solve this error

how to match attributes order of two instances in weka

£可爱£侵袭症+ 提交于 2019-12-25 02:12:32
问题 i have two Instances from StringToWordVector filter' output in this format: instances1 a b c 1 3 2 5 6 7 instances2 b c a 8 9 1 5 7 8 i want to match these attribute and make a merged instances in this format : a b c 1 2 3 5 6 7 1 8 9 8 5 7 回答1: You can make use of the InputMappedClassifier . If you keep the original doc collection you have two other options described here. 来源: https://stackoverflow.com/questions/21067439/how-to-match-attributes-order-of-two-instances-in-weka

Weka: why getMargin returns all zeros?

和自甴很熟 提交于 2019-12-25 00:23:13
问题 I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) data . /** * Initialization */ Instances data = ...; BayesNet bn = new EditableBayesNet(data); SearchAlgorithm learner = new TAN(); SimpleEstimator estimator = new SimpleEstimator(); /** * Training */ bn.initStructure(); learner.buildStructure(bn, data); estimator.estimateCPTs(bn); getMargin returns marginal distibution for a node. Ideally, assuming node A has 3 possible values, and its node index is 0. Then, bn

Output of RandomSubSpace classifier Weka API in Java

被刻印的时光 ゝ 提交于 2019-12-24 22:16:49
问题 I've built a RandomSubSpace classifier in weka exploer and am now attemping to use it with the weka Java API, however, when I run distibutionForInstance() I am getting an array with 1.0 as the first value and 0.0 as all the rest. I am trying to get the numerical prediction not the class. Is there a different function I should be using or a different option on distributionForInstance? Code Snippet below: Classifier cls = (Classifier) weka.core.SerializationHelper.read("2015-09-6 Random

How to set a value's for calculating Eucludeian distance and correlation

主宰稳场 提交于 2019-12-24 20:26:20
问题 Here is my word vector : google test stackoverflow yahoo I have assigned a value for these words as follows : google : 1 test : 2 stackoverflow : 3 yahoo : 4 Here are some sample users and their words : user1 google, test , stackoverflow user2 test , google user3 test , yahoo user4 stackoverflow , yahoo user5 stackoverflow , google user6 To cater for users which do not have value contained in the word vector I assign '0' Based on this, this corresponds to : user1 1, 2 , 3 user2 2 , 1 , 0

Writing the results of Weka classifier to file in Java

孤街浪徒 提交于 2019-12-24 18:34:52
问题 I am generating decision trees in Weka in Java code as follows: J48 j48DecisionTree = new J48(); Instances data = null; data = new Instances(new BufferedReader(new FileReader(dt.getArffFile()))); data.setClassIndex(data.numAttributes() - 1); j48DecisionTree.buildClassifier(data); Can I save the results of the Weka results buffer to a text file in the program, such that the following can be saved at run-time to a text file: === Stratified cross-validation === === Summary === Correctly

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

坚强是说给别人听的谎言 提交于 2019-12-24 17:38:10
问题 I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case? For nominal attributes I believe Weka is using the information gain criterion which

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

扶醉桌前 提交于 2019-12-24 17:38:05
问题 I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case? For nominal attributes I believe Weka is using the information gain criterion which

How to save cluster assignments in output file using Weka clustering XMeans?

|▌冷眼眸甩不掉的悲伤 提交于 2019-12-24 16:04:37
问题 Context I want to use Weka clustering algorithm XMeans . However I cannot figure out how to obtain cluster assignments from GUI of Weka . At the moment I can only see a list of cluster IDs along with percentage of entries assigned to each cluster. Question There any way to save cluster assignments for each entry in, e.g. CSV format? 回答1: Do everything in the "Preprocess Panel". This is one way to do this: Load Data File. Remove any Classification Attribute or Identifiers Choose Preprocess /

How to get J48 size and number of leaves

怎甘沉沦 提交于 2019-12-24 15:22:22
问题 If I build a J48 tree by: library(RWeka) fit <- J48(Species~., data=iris) I get the following result: > fit J48 pruned tree ------------------ Petal.Width <= 0.6: setosa (50.0) Petal.Width > 0.6 | Petal.Width <= 1.7 | | Petal.Length <= 4.9: versicolor (48.0/1.0) | | Petal.Length > 4.9 | | | Petal.Width <= 1.5: virginica (3.0) | | | Petal.Width > 1.5: versicolor (3.0/1.0) | Petal.Width > 1.7: virginica (46.0/1.0) Number of Leaves : 5 Size of the tree : 9 I would like to get the Number of