问题
i have function that create k-means algorithm using WEKA.jar. I have done creating function and showing the list of object in my console. But, i want to show specific attribute from k-means clustering.
This is my syntax result:
//importing required dependencies
import weka.core.Instance;
import weka.experiment.InstanceQuery;
public class KMeans {
/*get connection strings from database manager*/
private DatabaseManager datman = new DatabaseManager();
private String username = datman.getUsername(); //get username
private String password = datman.getPassword(); //get password
public void doProcess(){
int n = 3;
String queries = "SELECT idms_kodebarang, aksesoris, bahan, `QTY-SA-1`,`QTY-SA-2`,`QTY-SA-3`,`QTY-SA-4`,`harga` FROM mt_karakterproduk";
try {
InstanceQuery query = new InstanceQuery();
File reader = new File("DatabaseUtils.props");
query.setUsername(username);
query.setPassword(password);
query.setQuery(queries);
query.initialize(reader);
query.setSparseData(true);
Instances Data = query.retrieveInstances();
String[] options = weka.core.Utils.splitOptions("-I 100");
SimpleKMeans kmeans = new SimpleKMeans();
kmeans.setSeed(10);
kmeans.setOptions(options);
//this is the important parameter to set
kmeans.setNumClusters(n);
kmeans.setPreserveInstancesOrder(true);
kmeans.buildClusterer(Data);
EuclideanDistance Dist = (EuclideanDistance)kmeans.getDistanceFunction();
Instances instances = kmeans.getClusterCentroids();
//create cluster information print result
ClusterEvaluation eval = new ClusterEvaluation();
eval.setClusterer(kmeans);
for ( int i = 0; i < instances.numInstances(); i++ ) {
// for each cluster center
Instance inst = instances.instance( i );
Double dist1 = Dist.distance(instances.firstInstance(), Data.instance(i));
// as you mentioned, you only had 1 attribute
// but you can iterate through the different attributes
double value = inst.value( 0 );
java.lang.System.out.println( "Value for centroid " + i + ": " + value + " ::: " +dist1);
}
java.lang.System.out.printf("Cluster Results \n =================== \n "+eval.clusterResultsToString());
//this array returns the cluster number for each instance
//the array has as many elements as the number of instances
int[] assignments = kmeans.getAssignments();
int i = 0;
for(int clusternum : assignments){
java.lang.System.out.printf("Instance %d - > cluster %d \n", i, clusternum);
i++;
}
} catch (Exception e) {
java.lang.System.out.println("Error On KMeans Analysis Exception : " + e.toString());
}
}
}
the result only showing list like this:
- INFO: Instance 0 - > cluster 2
- INFO: Instance 2 - > cluster 2
- INFO: Instance 4 - > cluster 1
- INFO: Instance 6 - > cluster 2
- INFO: Instance 8 - > cluster 2
- INFO: Instance 10 - > cluster 1
- INFO: Instance 12 - > cluster 2
- INFO: Instance 14 - > cluster 0
- INFO: Instance 16 - > cluster 1
- INFO: Instance 18 - > cluster 1
- INFO: Instance 20 - > cluster 1
- INFO: Instance 22 - > cluster 1
- INFO: Instance 24 - > cluster 0
- INFO: Instance 26 - > cluster 0
- INFO: Instance 28 - > cluster 1
- INFO: Instance 30 - > cluster 1 ... etc..
i need to get result not only Instance string but specific attribute from database. so the result is like this (in my weka app)
Cluster centroids:
Cluster#
Attribute Full Data 0 1 2
(32) (8) (15) (9)
=============================================================================
idms_kodebarang E501245FF3 E613104F E501247FF3 E501245FF3
E501245FF3 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 ( 11%)
E501247FF3 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E820707F$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 ( 11%)
E820705F$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 ( 11%)
E5016B57FF 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5016B59FF 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E820701F$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 ( 11%)
E613104F 1 ( 3%) 1 ( 12%) 0 ( 0%) 0 ( 0%)
E820708F$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 ( 11%)
E521210F6 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5216B10F6 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E501245C$3KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 ( 11%)
E501247C$3KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 ( 11%)
E501238FF3 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E701601F 1 ( 3%) 1 ( 12%) 0 ( 0%) 0 ( 0%)
E613105F 1 ( 3%) 1 ( 12%) 0 ( 0%) 0 ( 0%)
E600201FC 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E600105C 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E620201C 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5016B57C$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 ( 11%)
E620501H 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5016B59C$KB 1 ( 3%) 0 ( 0%) 0 ( 0%) 1 ( 11%)
E800601F 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E880201H 1 ( 3%) 1 ( 12%) 0 ( 0%) 0 ( 0%)
E931301F 1 ( 3%) 1 ( 12%) 0 ( 0%) 0 ( 0%)
G932201F$ 1 ( 3%) 1 ( 12%) 0 ( 0%) 0 ( 0%)
E840104FC 1 ( 3%) 1 ( 12%) 0 ( 0%) 0 ( 0%)
E600300F 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E701104F 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E5016B50FF 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E702201F 1 ( 3%) 0 ( 0%) 1 ( 6%) 0 ( 0%)
E502415H6 1 ( 3%) 1 ( 12%) 0 ( 0%) 0 ( 0%)
how to achieve this?
thanks in advance.
回答1:
not sure if this is relevant now, but I hope that it helps someone with similar problem. I am working with the Weka K-Means clustering API too and the ClusterEvaluation class should give you the output in the form you want. I tried it on the Iris dataset and got the results as such:
Weka Tool K-Means Cluster (set numOfClusters = 2)
=== Run information ===
Scheme: weka.clusterers.SimpleKMeans -init 0 -max-candidates 100 -periodic-pruning 10000 -min-density 2.0 -t1 -1.25 -t2 -1.0 -N 2 -A "weka.core.EuclideanDistance -R first-last" -I 500 -num-slots 1 -S 10
Relation: iris
Instances: 150
Attributes: 5
sepallength
sepalwidth
petallength
petalwidth
class
Test mode: evaluate on training data
=== Clustering model (full training set) ===
kMeans
======
Number of iterations: 7
Within cluster sum of squared errors: 62.1436882815797
Initial starting points (random):
Cluster 0: 6.1,2.9,4.7,1.4,Iris-versicolor
Cluster 1: 6.2,2.9,4.3,1.3,Iris-versicolor
Missing values globally replaced with mean/mode
Final cluster centroids:
Cluster#
Attribute Full Data 0 1
(150.0) (100.0) (50.0)
==================================================================
sepallength 5.8433 6.262 5.006
sepalwidth 3.054 2.872 3.418
petallength 3.7587 4.906 1.464
petalwidth 1.1987 1.676 0.244
class Iris-setosa Iris-versicolor Iris-setosa
Time taken to build model (full training data) : 0.02 seconds
=== Model and evaluation on training set ===
Clustered Instances
0 100 ( 67%)
1 50 ( 33%)
And my clusterer using Weka API for the same dataset produced this result using the ClusterEvaluation class:
Cluster Evaluation results:
kMeans
======
Number of iterations: 7
Within cluster sum of squared errors: 62.14368828157972
Initial starting points (random):
Cluster 0: 6.1,2.9,4.7,1.4,Iris-versicolor
Cluster 1: 6.2,2.9,4.3,1.3,Iris-versicolor
Missing values globally replaced with mean/mode
Final cluster centroids:
Cluster#
Attribute Full Data 0 1
(150.0) (100.0) (50.0)
==================================================================
sepallength 5.8433 6.262 5.006
sepalwidth 3.054 2.872 3.418
petallength 3.7587 4.906 1.464
petalwidth 1.1987 1.676 0.244
class Iris-setosa Iris-versicolor Iris-setosa
Clustered Instances
0 100 ( 67%)
1 50 ( 33%)
I got the above code by performing the following steps:
Instances instances = new Instances("iris.arff");
SimpleKMeans simpleKMeans = new SimpleKMeans();
// build clusterer
simpleKMeans.setPreservationOrder(true);
simpleKMeans.setNumClusters(2);
simpleKMeans.buildClusterer(instances);
ClusterEvaluation eval = new ClusterEvaluation();
eval.setClusterer(simpleKMeans);
eval.evaluateClusterer(instances);
System.out.println("Cluster Evaluation: "+eval.clusterResultsToString());
The final print line prints the desired output. Hope this helps someone.
来源:https://stackoverflow.com/questions/21014916/getting-database-attribute-from-kmeans-clustering-weka