weka

Using StringToWordVector in Weka with internal data structures

只谈情不闲聊 提交于 2019-12-07 14:52:06
问题 I am trying to obtain document clustering using Weka. The process is a part of a larger pipeline, and I really can't afford to write out arff files. I have all the documents and the bag of words in each document as a Map<String, Multiset<String>> structure, where the keys are document names, and the Multiset<String> values are the bags of words in the documents. I have two questions, really: (1) Current approach ends up clustering terms, not documents: public final Instances

java weka stringtowordvector is not counting word occurences properly

六月ゝ 毕业季﹏ 提交于 2019-12-07 13:45:10
问题 so I'm using Weka Machine Learning Library's JAVA API and I have the following code: String html = "repeat repeat repeat"; Attribute input = new Attribute("html",(FastVector) null); FastVector inputVec = new FastVector(); inputVec.addElement(input); Instances htmlInst = new Instances("html",inputVec,1); htmlInst.add(new Instance(1)); htmlInst.instance(0).setValue(0, html); StringToWordVector filter = new StringToWordVector(); filter.setUseStoplist(true); filter.setInputFormat(htmlInst);

How to import XML files in WEKA

橙三吉。 提交于 2019-12-07 07:50:58
问题 I want to import a bunch of xml data in weka. Is there a straightforward solution or a tutorial or I have to maually convert it to a csv or arff file format? 回答1: There's no straightforward way to load instances into Weka from XML. Your only real options are CSV, arff or a database, so you'll have to write some conversion code. I've used rarff in the past to build arff files using Ruby. 回答2: WEKA does not support XML file as input dataset . WEKA allows to start Classifiers and Experiments

How to add weka features in a new algorithm?

风格不统一 提交于 2019-12-07 04:49:58
问题 I want to add a new algorithm to weka with features of classification, clustering, association etc in one algo. How should I write a code to include all the weka features and add a tab to weka for this new algorithm. I have added a dummy algorithm to weka and it works now I want to add an algorithm which has combination of features of weka. Thanks 回答1: If you want to add a new algorithm in Weka, have a look at the Weka Manual ( http://www.cs.waikato.ac.nz/ml/weka/index.html ) In the part IV -

Creating an ARFF file from python output

霸气de小男生 提交于 2019-12-06 22:26:05
问题 gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': {'dail': 1, 'focus': 1, 'actions': 1, 'trade': 2, 'protest': 1, 'identify': 1, 'previous': 1, 'detectives': 1, 'republican': 1, 'group': 1, 'monitor': 1, 'clashes': 1, 'civil': 1, 'charge': 1, 'breaches': 1, 'travelling': 1, 'main': 1, 'disrupt': 1, 'real': 1, 'policing': 3, 'march': 6, 'finance': 1, 'drawn': 1, 'assistant': 1, 'protesters': 1, 'emphasised': 1, 'department': 1, 'traffic': 2, 'outbreak': 1, 'culprits': 1,

Unable to access training dataset for ML classification using Weka in Java

♀尐吖头ヾ 提交于 2019-12-06 16:16:44
问题 I am trying to classify an instance using Weka in Java (specifically Android Studio). Initially, I saved a model from the Desktop Weka GUI and tried to import it into my project directory. If I am correct, this won't work because the Weka JDKs are different on PC versus Android. Now I am trying to train a model on the Android itself (as I see no other option) by importing the training dataset. Here is where I am running into problems. When I run "Test.java," I get this error saying that my

标准化和归一化(综合)

假装没事ソ 提交于 2019-12-06 13:07:06
part1: 【转】https://blog.csdn.net/weixin_40165004/article/details/89080968 Weka数据预处理(一) 对于数据挖掘而言,我们往往仅关注实质性的挖掘算法,如分类、聚类、关联规则等,而忽视待挖掘数据的质量,但是高质量的数据才能产生高质量的挖掘结果,否则只有"Garbage in garbage out"了。保证待数据数据质量的重要一步就是数据预处理(Data Pre-Processing),在实际操作中,数据准备阶段往往能占用整个挖掘过程6~8成的时间。本文就weka工具中的数据预处理方法作一下介绍。 Weka的数据预处理又叫数据过滤,他们可以在weka.filters中找到。根据过滤算法的性质,可以分为有监督的(SupervisedFilter)和无监督的(UnsupervisedFilter)。对于前者,过滤器需要设置一个类属性,要考虑数据集中类的属性及其分布,以确定最佳的容器的数量和规模;而后者类的属性可以不存在。同时,这些过滤算法又可归结为基于属性的(attribute)和基于实例的(instance)。基于属性的方法主要是用于处理列,例如,添加或删除列;而基于实例的方法主要是用于处理行,例如,添加或删除行。 数据过滤主要解决以下问题(老生常谈的): 数据的缺失值处理、标准化、规范化和离散化处理。

Using a arff file for storing data

我怕爱的太早我们不能终老 提交于 2019-12-06 12:45:13
I am using this example to create my .arff file for my weka projext enter link description here . double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0}, {19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}}; int numInstances = data[0].length; FastVector

What is the Stacking Algorithm in Weka? How it actually is working?

做~自己de王妃 提交于 2019-12-06 10:50:49
Is the result of Base classifiers are being selected by voting system & then what actually the Meta classifier is getting as it's input,whole classifier or just the miss-classified ones ? It would be helpful if the whole mechanism can be explained with a simple example like this link Majority vote algorithm in Weka.classifiers.meta.vote Thanks in advance. Consider an ensemble of n members. Each of these members are trained on a given set of training data. The ensemble members may share the same classifier type (homogeneous) or use different classifiers (heterogeneous). Diversity is encouraged

Is there a workaround to solve “Java heap space” memory error when the max heap value has been already specified?

喜夏-厌秋 提交于 2019-12-06 07:07:04
问题 I'm running a WEKA classifier (J48 with an input .arff file composed of 3 fields, field 1 has ~27k distinct attributes, field 2 ~ 500k values) in a latest generation Macbook Pro with 8GB RAM. I increased the java heap space to the maximum possible using the -Xmx parameter: java -Xmx7G -cp weka-3-6-10/weka.jar weka.classifiers.trees.J48 -t myfiles/loc_linear.arff -i however when I run the classifier (after about 10 minutes) I get the error " Exception in thread "main" java.lang