weka

Which Weka and LibSVM .jar files to use in Java code for SVM classification

匿名 (未验证) 提交于 2019-12-03 08:56:10
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 由 翻译 强力驱动 问题: If I use Weka Explorer to run some training data against testing data using SVM with a linear kernel, everything is fine. But I need to do this programmatically in my own Java and my current code looks like this: Instances train = new Instances (...); train . setClassIndex ( train . numAttributes () - 1 ); Instances test = new Instances (...) + ClassificationType classificationType = ClassificationTypeDAO . get ( 6 ); LibSVM libsvm = new LibSVM (); String options = ( classificationType . getParameters ()); String [] optionsArray =

Sentiment analysis with NLTK python for sentences using sample data or webservice?

こ雲淡風輕ζ 提交于 2019-12-03 08:49:20
问题 I am embarking upon a NLP project for sentiment analysis. I have successfully installed NLTK for python (seems like a great piece of software for this). However,I am having trouble understanding how it can be used to accomplish my task. Here is my task: I start with one long piece of data (lets say several hundred tweets on the subject of the UK election from their webservice) I would like to break this up into sentences (or info no longer than 100 or so chars) (I guess i can just do this in

Finding out wrongly classified instances when using WEKA

耗尽温柔 提交于 2019-12-03 07:43:56
I am using GUI version of WEKA and I am classifying using the Naive Bayes. Can anyone please let me know how to find out which instances are misclassified. Go to classify tab in Weka explorer Click more options... Check output predictions Click OK Hope that helps. I faced this very same problem earlier and I tackle it just fine now. What I do, is the following: Make one String attribute that assigns each instance a unique ID. I have assigned the names of the documents to each of my instances. Generate the WEKA supported .arff file. Whenever you have to run a classifier on this .arff data, you

Weka simple K-means clustering assignments

∥☆過路亽.° 提交于 2019-12-03 07:24:27
问题 I have what feels like a simple problem, but I can't seem to find an answer. I'm pretty new to Weka, but I feel like I've done a bit of research on this (at least read through the first couple of pages of Google results) and come up dry. I am using Weka to run clustering using Simple K-Means. In the results list I have no problem visualizing my output ("Visualize cluster assignments") and it is clear both from my understanding of the K-Means algorithm and the output of Weka that each of my

.arff files with scikit-learn?

依然范特西╮ 提交于 2019-12-03 06:37:33
I would like to use an Attribute-Relation File Format with scikit-learn to do some NLP task, is this possible? How can use an .arff file with scikit-learn ? I really recommend liac-arff . It doesn't load directly to numpy, but the conversion is simple: import arff, numpy as np dataset = arff.load(open('mydataset.arff', 'rb')) data = np.array(dataset['data']) I found that scipy has a loader for arff files to load them as numpy record arrays. I am not 100% sure that those arrays are suitable for direct consumption by scikit-learn but that should get your started. Follow renatopp's answer: assume

Skip feature when classifying, but show feature in output

你。 提交于 2019-12-03 05:00:25
I've created a dataset which contains +/- 13000 rows with +/- 50 features. I know how to output every classification result: prediction and actual, but I would like to be able to output some sort of ID with those results. So i've added a ID column to my dataset but I don't know how disregard the ID when classifying while still being able to output the ID with every prediction result. I do know how to select features to output with every prediction. Use FilteredClassifier. See this and this . Let's say follwoing are the attributes in the bbcsport.arff that you want to remove and is in a file

Weka's PCA is taking too long to run

萝らか妹 提交于 2019-12-03 03:57:08
问题 I am trying to use Weka for feature selection using PCA algorithm. My original feature space contains ~9000 attributes, in 2700 samples. I tried to reduce dimensionality of the data using the following code: AttributeSelection selector = new AttributeSelection(); PrincipalComponents pca = new PrincipalComponents(); Ranker ranker = new Ranker(); selector.setEvaluator(pca); selector.setSearch(ranker); Instances instances = SamplesManager.asWekaInstances(trainSet); try { selector

Getting Xmeans clusterer output programmatically in Weka

[亡魂溺海] 提交于 2019-12-03 03:38:27
When using Kmeans in Weka, one can call getAssignments() on the resulting output of the model to get the cluster assignment for each given instance. Here's a (truncated) Jython example: >>>import weka.clusterers.SimpleKMeans as kmeans >>>kmeans.buildClusterer(data) >>>assignments = kmeans.getAssignments() >>>assignments >>>array('i',[14, 16, 0, 0, 0, 0, 16,...]) The index of each cluster number corresponds to the instance. So, instance 0 is in cluster 14, instance 1 is in cluster 16, and so on. My question is: Is there something similar for Xmeans? I've gone through the entire API here and don

How to interpret Weka Logistic Regression output?

泄露秘密 提交于 2019-12-03 03:25:42
Please help interpret results of logistic regression produced by weka.classifiers.functions.Logistic from Weka library. I use numeric data from Weka examples: @relation weather @attribute outlook {sunny, overcast, rainy} @attribute temperature real @attribute humidity real @attribute windy {TRUE, FALSE} @attribute play {yes, no} @data sunny,85,85,FALSE,no sunny,80,90,TRUE,no overcast,83,86,FALSE,yes rainy,70,96,FALSE,yes rainy,68,80,FALSE,yes rainy,65,70,TRUE,no overcast,64,65,TRUE,yes sunny,72,95,FALSE,no sunny,69,70,FALSE,yes rainy,75,80,FALSE,yes sunny,75,70,TRUE,yes overcast,72,90,TRUE,yes

Adding an instance to Instances in weka

匿名 (未验证) 提交于 2019-12-03 02:54:01
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试): 问题: I have a few arff files. I would like to read them sequentially and create a large dataset. Instances.add(Instance inst) doesn't add string values to the instances, hence the attempt to setDataset() ... but even this fails. Is there a way to accomplish the intuitively correct thing for strings? ArffLoader arffLoader = new ArffLoader(); arffLoader.setFile(new File(fName)); Instances newData = arffLoader.getDataSet(); for (int i = 0; i < newData.numInstances(); i++) { Instance one = newData.instance(i); one.setDataset(data); data.add(one); }