weka | 易学教程

How to import XML files in WEKA

阅读更多关于 How to import XML files in WEKA

I want to import a bunch of xml data in weka. Is there a straightforward solution or a tutorial or I have to maually convert it to a csv or arff file format? There's no straightforward way to load instances into Weka from XML. Your only real options are CSV, arff or a database, so you'll have to write some conversion code. I've used rarff in the past to build arff files using Ruby. WEKA does not support XML file as input dataset . WEKA allows to start Classifiers and Experiments with the -xml option followed by a filename to retrieve the command line options from the XML file instead of the

How to add weka features in a new algorithm?

阅读更多关于 How to add weka features in a new algorithm?

I want to add a new algorithm to weka with features of classification, clustering, association etc in one algo. How should I write a code to include all the weka features and add a tab to weka for this new algorithm. I have added a dummy algorithm to weka and it works now I want to add an algorithm which has combination of features of weka. Thanks If you want to add a new algorithm in Weka, have a look at the Weka Manual ( http://www.cs.waikato.ac.nz/ml/weka/index.html ) In the part IV - Appendix, you have the chapter Extending Weka and inside the part Writing a new Classifier . Very basically

SMOTE oversampling and cross-validation

阅读更多关于 SMOTE oversampling and cross-validation

I am working on a binary classification problem in Weka with a highly imbalanced data set (90% in one category and 10% in the other). I first applied SMOTE ( http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html ) to the entire data set to even out the categories and then performed 10-fold cross-validation over the newly obtained data. I found (overly?) optimistic results with F1 around 90%. Is this due to oversampling? Is it bad practice to perform cross-validation on data on which SMOTE is applied? Are there any ways to solve this problem? I think you should split

Unable to execute jar file despite having PATH and CLASSPATH set

阅读更多关于 Unable to execute jar file despite having PATH and CLASSPATH set

My question is regarding including jar files in path. It has 2 parts. 1) I am trying to execute weka.jar jar file located in /home/andy/software/weka/weka.jar PATH variable points to this jar file (i.e. to /home/andy/software/weka/weka.jar) and so does CLASSPATH. However when I try to run the jar using java -jar weka.jar, I get an error "Unable to access jarfile weka.jar". Any ideas what is going on? I am on Ubuntu Linux. I looked around in SO and it seems like I am not doing anything that is obviously wrong (since both PATH and CLASSPATH seem to be set correctly). 2)I would like to be able to

Where should I save my file in Android for local access?

阅读更多关于 Where should I save my file in Android for local access?

问题 I have two datasets which are currently in the same folder as my java files AND on my PC. Currently, I am accessing them through my C-drive. Since this is an app, where should I save my .ARFF files and what path should I use instead? I have tried in the raw folder, but nothing seems to work. Here's what I have so far... 回答1: Create a raw directory in your project, raw is included in the res folder of android project. You can add an assets files in raw folder like music files, database files

Simple text classification using naive bayes (weka) in java

阅读更多关于 Simple text classification using naive bayes (weka) in java

I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input. this is my training data: @relation hamspam @attribute text string @attribute class {spam,ham} @data 'good',ham 'good',ham 'very good',ham 'bad',spam 'very bad',spam 'very bad, very bad',spam 'good good bad',ham this is my testing_data: @relation test @attribute text string @attribute class {spam,ham} @data 'good bad very bad',? 'good bad very bad',? 'good',? 'good very good',? 'bad',? 'very good'

beginner question on investigating on samples in Weka

阅读更多关于 beginner question on investigating on samples in Weka

问题 I've just used Weka to train my SVM classifier under "Classify" tag. Now I want to further investigate which data samples are mis-classified,I need to study their pattern,but I don't know where to look at this from Weka. Could anyone give me some help please? Thanks in advance. 回答1: You can enable the option from: You will get the following instance predictions: === Predictions on test split === inst# actual predicted error prediction 1 2:Iris-ver 2:Iris-ver 0.667 ... 16 3:Iris-vir 2:Iris-ver

Using LIBSVM to predict authenticity of the user

阅读更多关于 Using LIBSVM to predict authenticity of the user

I am planning on using LibSVM to predict user authenticity in web applications. (1) Collect Data on particular user behavior(eg. LogIn time, IP Address, Country etc.) (2) Use Collected Data to train an SVM (3) Use real time data to compare and generate an output on level of authenticity Can some one tell me how can I do such a thing with LibSVM? Can Weka be helpful in these types of problems? The three steps you mention are an outline of the solution. In some more detail: Make sure you get plenty of labeled data, i.e. behavior logs annotated with authentic/non-authentic. (Without labeled data,

Creating an ARFF file from python output

阅读更多关于 Creating an ARFF file from python output

gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': {'dail': 1, 'focus': 1, 'actions': 1, 'trade': 2, 'protest': 1, 'identify': 1, 'previous': 1, 'detectives': 1, 'republican': 1, 'group': 1, 'monitor': 1, 'clashes': 1, 'civil': 1, 'charge': 1, 'breaches': 1, 'travelling': 1, 'main': 1, 'disrupt': 1, 'real': 1, 'policing': 3, 'march': 6, 'finance': 1, 'drawn': 1, 'assistant': 1, 'protesters': 1, 'emphasised': 1, 'department': 1, 'traffic': 2, 'outbreak': 1, 'culprits': 1, 'proportionate': 1, 'instructions': 1, 'warned': 2, 'commanders': 1, 'michael': 2, 'exploit': 1, 'culminating'

How to represent text for classification in weka?

阅读更多关于 How to represent text for classification in weka?

Can you please let me know how to represent attribute or class for text classification in weka. By using what attribute can I do classification? word frequency or just word? What would be possible structure of ARFF format? Can you give me several lines of example of that structure? Thank you very much in advance. One of the easiest alternatives is to start with an ARFF file for a two class problem like: @relation corpus @attribute text string @attribute class {pos,neg} @data 'long text with words ... ',pos The text is represented as a String type and the class is a nominal with two values.