weka | 易学教程

Trained and Test data have different number of attributes that gave an error “Train and test set are not compatible”

阅读更多关于 Trained and Test data have different number of attributes that gave an error “Train and test set are not compatible”

问题 I use WEKA for Text classification , I have trained data set , and I apply StringToWOrdVector and NumericToNominal filters , and have test data set and applied the same filters on it . When I try to apply my model on test data ,it gave me the following error Train and test set are not compatible I searched for a solution , the error occurred because number of attributes different between two sets, and it always be different because texts in two sets are different How I can solve this error

how to match attributes order of two instances in weka

阅读更多关于 how to match attributes order of two instances in weka

问题 i have two Instances from StringToWordVector filter' output in this format: instances1 a b c 1 3 2 5 6 7 instances2 b c a 8 9 1 5 7 8 i want to match these attribute and make a merged instances in this format : a b c 1 2 3 5 6 7 1 8 9 8 5 7 回答1: You can make use of the InputMappedClassifier . If you keep the original doc collection you have two other options described here. 来源： https://stackoverflow.com/questions/21067439/how-to-match-attributes-order-of-two-instances-in-weka

Weka: why getMargin returns all zeros?

阅读更多关于 Weka: why getMargin returns all zeros?

问题 I am using Weka Java API. I trained a Bayesnet on an Instances object (data set) data . /** * Initialization */ Instances data = ...; BayesNet bn = new EditableBayesNet(data); SearchAlgorithm learner = new TAN(); SimpleEstimator estimator = new SimpleEstimator(); /** * Training */ bn.initStructure(); learner.buildStructure(bn, data); estimator.estimateCPTs(bn); getMargin returns marginal distibution for a node. Ideally, assuming node A has 3 possible values, and its node index is 0. Then, bn

Output of RandomSubSpace classifier Weka API in Java

阅读更多关于 Output of RandomSubSpace classifier Weka API in Java

问题 I've built a RandomSubSpace classifier in weka exploer and am now attemping to use it with the weka Java API, however, when I run distibutionForInstance() I am getting an array with 1.0 as the first value and 0.0 as all the rest. I am trying to get the numerical prediction not the class. Is there a different function I should be using or a different option on distributionForInstance? Code Snippet below: Classifier cls = (Classifier) weka.core.SerializationHelper.read("2015-09-6 Random

How to set a value's for calculating Eucludeian distance and correlation

阅读更多关于 How to set a value's for calculating Eucludeian distance and correlation

问题 Here is my word vector : google test stackoverflow yahoo I have assigned a value for these words as follows : google : 1 test : 2 stackoverflow : 3 yahoo : 4 Here are some sample users and their words : user1 google, test , stackoverflow user2 test , google user3 test , yahoo user4 stackoverflow , yahoo user5 stackoverflow , google user6 To cater for users which do not have value contained in the word vector I assign '0' Based on this, this corresponds to : user1 1, 2 , 3 user2 2 , 1 , 0

Writing the results of Weka classifier to file in Java

阅读更多关于 Writing the results of Weka classifier to file in Java

问题 I am generating decision trees in Weka in Java code as follows: J48 j48DecisionTree = new J48(); Instances data = null; data = new Instances(new BufferedReader(new FileReader(dt.getArffFile()))); data.setClassIndex(data.numAttributes() - 1); j48DecisionTree.buildClassifier(data); Can I save the results of the Weka results buffer to a text file in the program, such that the following can be saved at run-time to a text file: === Stratified cross-validation === === Summary === Correctly

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

阅读更多关于 What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

问题 I'm using RandomForest from Weka 3.7.11 which in turn is bagging Weka's RandomTree. My input attributes are numerical and the output attribute(label) is also numerical. When training the RandomTree, K attributes are chosen at random for each node of the tree. Several splits based on those attributes are attempted and the "best" one is chosen. How does Weka determine what split is best in this (numerical) case? For nominal attributes I believe Weka is using the information gain criterion which

What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

阅读更多关于 What splitting criterion does Random Tree in Weka 3.7.11 use for numerical attributes?

How to save cluster assignments in output file using Weka clustering XMeans?

阅读更多关于 How to save cluster assignments in output file using Weka clustering XMeans?

问题 Context I want to use Weka clustering algorithm XMeans . However I cannot figure out how to obtain cluster assignments from GUI of Weka . At the moment I can only see a list of cluster IDs along with percentage of entries assigned to each cluster. Question There any way to save cluster assignments for each entry in, e.g. CSV format? 回答1: Do everything in the "Preprocess Panel". This is one way to do this: Load Data File. Remove any Classification Attribute or Identifiers Choose Preprocess /

How to get J48 size and number of leaves

阅读更多关于 How to get J48 size and number of leaves

问题 If I build a J48 tree by: library(RWeka) fit <- J48(Species~., data=iris) I get the following result: > fit J48 pruned tree ------------------ Petal.Width <= 0.6: setosa (50.0) Petal.Width > 0.6 | Petal.Width <= 1.7 | | Petal.Length <= 4.9: versicolor (48.0/1.0) | | Petal.Length > 4.9 | | | Petal.Width <= 1.5: virginica (3.0) | | | Petal.Width > 1.5: versicolor (3.0/1.0) | Petal.Width > 1.7: virginica (46.0/1.0) Number of Leaves : 5 Size of the tree : 9 I would like to get the Number of