weka | 易学教程

Using StringToWordVector in Weka with internal data structures

阅读更多关于 Using StringToWordVector in Weka with internal data structures

问题 I am trying to obtain document clustering using Weka. The process is a part of a larger pipeline, and I really can't afford to write out arff files. I have all the documents and the bag of words in each document as a Map<String, Multiset<String>> structure, where the keys are document names, and the Multiset<String> values are the bags of words in the documents. I have two questions, really: (1) Current approach ends up clustering terms, not documents: public final Instances

java weka stringtowordvector is not counting word occurences properly

阅读更多关于 java weka stringtowordvector is not counting word occurences properly

问题 so I'm using Weka Machine Learning Library's JAVA API and I have the following code: String html = "repeat repeat repeat"; Attribute input = new Attribute("html",(FastVector) null); FastVector inputVec = new FastVector(); inputVec.addElement(input); Instances htmlInst = new Instances("html",inputVec,1); htmlInst.add(new Instance(1)); htmlInst.instance(0).setValue(0, html); StringToWordVector filter = new StringToWordVector(); filter.setUseStoplist(true); filter.setInputFormat(htmlInst);

How to import XML files in WEKA

阅读更多关于 How to import XML files in WEKA

问题 I want to import a bunch of xml data in weka. Is there a straightforward solution or a tutorial or I have to maually convert it to a csv or arff file format? 回答1: There's no straightforward way to load instances into Weka from XML. Your only real options are CSV, arff or a database, so you'll have to write some conversion code. I've used rarff in the past to build arff files using Ruby. 回答2: WEKA does not support XML file as input dataset . WEKA allows to start Classifiers and Experiments

How to add weka features in a new algorithm?

阅读更多关于 How to add weka features in a new algorithm?

问题 I want to add a new algorithm to weka with features of classification, clustering, association etc in one algo. How should I write a code to include all the weka features and add a tab to weka for this new algorithm. I have added a dummy algorithm to weka and it works now I want to add an algorithm which has combination of features of weka. Thanks 回答1: If you want to add a new algorithm in Weka, have a look at the Weka Manual ( http://www.cs.waikato.ac.nz/ml/weka/index.html ) In the part IV -

Creating an ARFF file from python output

阅读更多关于 Creating an ARFF file from python output

问题 gardai-plan-crackdown-on-troublemakers-at-protest-2438316.html': {'dail': 1, 'focus': 1, 'actions': 1, 'trade': 2, 'protest': 1, 'identify': 1, 'previous': 1, 'detectives': 1, 'republican': 1, 'group': 1, 'monitor': 1, 'clashes': 1, 'civil': 1, 'charge': 1, 'breaches': 1, 'travelling': 1, 'main': 1, 'disrupt': 1, 'real': 1, 'policing': 3, 'march': 6, 'finance': 1, 'drawn': 1, 'assistant': 1, 'protesters': 1, 'emphasised': 1, 'department': 1, 'traffic': 2, 'outbreak': 1, 'culprits': 1,

Unable to access training dataset for ML classification using Weka in Java

阅读更多关于 Unable to access training dataset for ML classification using Weka in Java

问题 I am trying to classify an instance using Weka in Java (specifically Android Studio). Initially, I saved a model from the Desktop Weka GUI and tried to import it into my project directory. If I am correct, this won't work because the Weka JDKs are different on PC versus Android. Now I am trying to train a model on the Android itself (as I see no other option) by importing the training dataset. Here is where I am running into problems. When I run "Test.java," I get this error saying that my

标准化和归一化（综合）

阅读更多关于标准化和归一化（综合）

part1：【转】https://blog.csdn.net/weixin_40165004/article/details/89080968 Weka数据预处理(一) 对于数据挖掘而言，我们往往仅关注实质性的挖掘算法，如分类、聚类、关联规则等，而忽视待挖掘数据的质量，但是高质量的数据才能产生高质量的挖掘结果，否则只有"Garbage in garbage out"了。保证待数据数据质量的重要一步就是数据预处理（Data Pre-Processing），在实际操作中，数据准备阶段往往能占用整个挖掘过程6~8成的时间。本文就weka工具中的数据预处理方法作一下介绍。 Weka的数据预处理又叫数据过滤，他们可以在weka.filters中找到。根据过滤算法的性质，可以分为有监督的（SupervisedFilter）和无监督的（UnsupervisedFilter）。对于前者，过滤器需要设置一个类属性，要考虑数据集中类的属性及其分布，以确定最佳的容器的数量和规模；而后者类的属性可以不存在。同时，这些过滤算法又可归结为基于属性的（attribute）和基于实例的(instance)。基于属性的方法主要是用于处理列，例如，添加或删除列；而基于实例的方法主要是用于处理行，例如，添加或删除行。数据过滤主要解决以下问题（老生常谈的）：数据的缺失值处理、标准化、规范化和离散化处理。

Using a arff file for storing data

阅读更多关于 Using a arff file for storing data

I am using this example to create my .arff file for my weka projext enter link description here . double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0}, {19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}}; int numInstances = data[0].length; FastVector

What is the Stacking Algorithm in Weka? How it actually is working?

阅读更多关于 What is the Stacking Algorithm in Weka? How it actually is working?

Is the result of Base classifiers are being selected by voting system & then what actually the Meta classifier is getting as it's input,whole classifier or just the miss-classified ones ? It would be helpful if the whole mechanism can be explained with a simple example like this link Majority vote algorithm in Weka.classifiers.meta.vote Thanks in advance. Consider an ensemble of n members. Each of these members are trained on a given set of training data. The ensemble members may share the same classifier type (homogeneous) or use different classifiers (heterogeneous). Diversity is encouraged

Is there a workaround to solve “Java heap space” memory error when the max heap value has been already specified?

阅读更多关于 Is there a workaround to solve “Java heap space” memory error when the max heap value has been already specified?

问题 I'm running a WEKA classifier (J48 with an input .arff file composed of 3 fields, field 1 has ~27k distinct attributes, field 2 ~ 500k values) in a latest generation Macbook Pro with 8GB RAM. I increased the java heap space to the maximum possible using the -Xmx parameter: java -Xmx7G -cp weka-3-6-10/weka.jar weka.classifiers.trees.J48 -t myfiles/loc_linear.arff -i however when I run the classifier (after about 10 minutes) I get the error " Exception in thread "main" java.lang