weka

How to import XML files in WEKA

余生颓废 提交于 2019-12-05 14:09:48
I want to import a bunch of xml data in weka. Is there a straightforward solution or a tutorial or I have to maually convert it to a csv or arff file format? There's no straightforward way to load instances into Weka from XML. Your only real options are CSV, arff or a database, so you'll have to write some conversion code. I've used rarff in the past to build arff files using Ruby. WEKA does not support XML file as input dataset . WEKA allows to start Classifiers and Experiments with the -xml option followed by a filename to retrieve the command line options from the XML file instead of the

How to add weka features in a new algorithm?

风格不统一 提交于 2019-12-05 09:52:17
I want to add a new algorithm to weka with features of classification, clustering, association etc in one algo. How should I write a code to include all the weka features and add a tab to weka for this new algorithm. I have added a dummy algorithm to weka and it works now I want to add an algorithm which has combination of features of weka. Thanks If you want to add a new algorithm in Weka, have a look at the Weka Manual ( http://www.cs.waikato.ac.nz/ml/weka/index.html ) In the part IV - Appendix, you have the chapter Extending Weka and inside the part Writing a new Classifier . Very basically

SMOTE oversampling and cross-validation

旧巷老猫 提交于 2019-12-05 09:04:47
I am working on a binary classification problem in Weka with a highly imbalanced data set (90% in one category and 10% in the other). I first applied SMOTE ( http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume16/chawla02a-html/node6.html ) to the entire data set to even out the categories and then performed 10-fold cross-validation over the newly obtained data. I found (overly?) optimistic results with F1 around 90%. Is this due to oversampling? Is it bad practice to perform cross-validation on data on which SMOTE is applied? Are there any ways to solve this problem? I think you should split

Unable to execute jar file despite having PATH and CLASSPATH set

被刻印的时光 ゝ 提交于 2019-12-05 08:42:58
My question is regarding including jar files in path. It has 2 parts. 1) I am trying to execute weka.jar jar file located in /home/andy/software/weka/weka.jar PATH variable points to this jar file (i.e. to /home/andy/software/weka/weka.jar) and so does CLASSPATH. However when I try to run the jar using java -jar weka.jar, I get an error "Unable to access jarfile weka.jar". Any ideas what is going on? I am on Ubuntu Linux. I looked around in SO and it seems like I am not doing anything that is obviously wrong (since both PATH and CLASSPATH seem to be set correctly). 2)I would like to be able to

Where should I save my file in Android for local access?

こ雲淡風輕ζ 提交于 2019-12-05 07:42:56
问题 I have two datasets which are currently in the same folder as my java files AND on my PC. Currently, I am accessing them through my C-drive. Since this is an app, where should I save my .ARFF files and what path should I use instead? I have tried in the raw folder, but nothing seems to work. Here's what I have so far... 回答1: Create a raw directory in your project, raw is included in the res folder of android project. You can add an assets files in raw folder like music files, database files

Simple text classification using naive bayes (weka) in java

穿精又带淫゛_ 提交于 2019-12-05 07:32:22
I try to do text classification naive bayes weka libarary in my java code, but i think the result of the classification is not correct, i don't know what's the problem. I use arff file for the input. this is my training data: @relation hamspam @attribute text string @attribute class {spam,ham} @data 'good',ham 'good',ham 'very good',ham 'bad',spam 'very bad',spam 'very bad, very bad',spam 'good good bad',ham this is my testing_data: @relation test @attribute text string @attribute class {spam,ham} @data 'good bad very bad',? 'good bad very bad',? 'good',? 'good very good',? 'bad',? 'very good'

beginner question on investigating on samples in Weka

走远了吗. 提交于 2019-12-05 07:08:54
问题 I've just used Weka to train my SVM classifier under "Classify" tag. Now I want to further investigate which data samples are mis-classified,I need to study their pattern,but I don't know where to look at this from Weka. Could anyone give me some help please? Thanks in advance. 回答1: You can enable the option from: You will get the following instance predictions: === Predictions on test split === inst# actual predicted error prediction 1 2:Iris-ver 2:Iris-ver 0.667 ... 16 3:Iris-vir 2:Iris-ver

Using LIBSVM to predict authenticity of the user

余生颓废 提交于 2019-12-05 04:47:16
I am planning on using LibSVM to predict user authenticity in web applications. (1) Collect Data on particular user behavior(eg. LogIn time, IP Address, Country etc.) (2) Use Collected Data to train an SVM (3) Use real time data to compare and generate an output on level of authenticity Can some one tell me how can I do such a thing with LibSVM? Can Weka be helpful in these types of problems? The three steps you mention are an outline of the solution. In some more detail: Make sure you get plenty of labeled data, i.e. behavior logs annotated with authentic/non-authentic. (Without labeled data,

Creating an ARFF file from python output

£可爱£侵袭症+ 提交于 2019-12-05 04:07:44
gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': {'dail': 1, 'focus': 1, 'actions': 1, 'trade': 2, 'protest': 1, 'identify': 1, 'previous': 1, 'detectives': 1, 'republican': 1, 'group': 1, 'monitor': 1, 'clashes': 1, 'civil': 1, 'charge': 1, 'breaches': 1, 'travelling': 1, 'main': 1, 'disrupt': 1, 'real': 1, 'policing': 3, 'march': 6, 'finance': 1, 'drawn': 1, 'assistant': 1, 'protesters': 1, 'emphasised': 1, 'department': 1, 'traffic': 2, 'outbreak': 1, 'culprits': 1, 'proportionate': 1, 'instructions': 1, 'warned': 2, 'commanders': 1, 'michael': 2, 'exploit': 1, 'culminating'

How to represent text for classification in weka?

北城以北 提交于 2019-12-05 03:16:21
Can you please let me know how to represent attribute or class for text classification in weka. By using what attribute can I do classification? word frequency or just word? What would be possible structure of ARFF format? Can you give me several lines of example of that structure? Thank you very much in advance. One of the easiest alternatives is to start with an ARFF file for a two class problem like: @relation corpus @attribute text string @attribute class {pos,neg} @data 'long text with words ... ',pos The text is represented as a String type and the class is a nominal with two values.