I am using this example to create my .arff file for my weka projext enter link description here.
double[][] data = {{4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0},
{19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0, 243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0, 354.0, 356.0, 357.0, 358.0}};
int numInstances = data[0].length;
FastVector atts = new FastVector();
ArrayList<Instance> instances = new ArrayList<Instance>();
for (int dim = 0; dim < 2; dim++) {
// Create new attribute / dimension
Attribute current = new Attribute("Attribute" + dim, dim);
// Create an instance for each data object
if (dim == 0) {
for (int obj = 0; obj < numInstances; obj++) {
instances.add(new SparseInstance(0));
}
}
// Fill the value of dimension "dim" into each object
for (int obj = 0; obj < numInstances; obj++) {
instances.get(obj).setValue(current, data[dim][obj]);
System.out.println(instances.get(obj));
}
// Add attribute to total attributes
atts.addElement(current);
}
// Create new dataset
Instances newDataset = new Instances("Dataset", atts, instances.size());
// Fill in data objects
for (Instance inst : instances) {
newDataset.add(inst);
}
BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
writer.write(newDataset.toString());
writer.flush();
writer.close();
}
I ve noticed that the result format puts the rows element the vector in the columns of the .arff file. I want to put the whole row in the first row of the .arff file. How can i do so? For my case the last column of the 2d vector represents the label of the row data.
The expected result for my arff file:
4058.0, 4059.0, 4060.0, 214.0, 1710.0, 2452.0, 2473.0, 2474.0, 2475.0, 2476.0, 2477.0, 2478.0, 2688.0, 2905.0, 2906.0, 2907.0, 2908.0, 2909.0, 2950.0, 2969.0, 2970.0, 3202.0, 3342.0, 3900.0, 4007.0, 4052.0, 4058.0, 4059.0, 4060.0, 1 // for example the first row
19.0, 20.0, 21.0, 31.0, 103.0, 136.0, 141.0, 142.0, 143.0, 144.0, 145.0, 146.0, 212.0,
243.0, 244.0, 245.0, 246.0, 247.0, 261.0, 270.0, 271.0, 294.0, 302.0, 340.0, 343.0,
354.0, 356.0, 357.0, 358.0, 0 // the second row.
The code in the example treats each column in the table as an instance (so there are 29 instances, each with two attributes). It sounds like you want to treat each row as an instance (giving two instances, each with 29 attributes):
double[][] data = {
{4058.0, 4059.0, ... }, /* first instance */
{19.0, 20.0, ... } /* second instance */
};
int numAtts = data[0].length;
FastVector atts = new FastVector(numAtts);
for (int att = 0; att < numAtts; att++)
{
atts.addElement(new Attribute("Attribute" + att, att));
}
int numInstances = data.length;
Instances dataset = new Instances("Dataset", atts, numInstances);
for (int inst = 0; inst < numInstances; inst++)
{
dataset.add(new Instance(1.0, data[inst]));
}
BufferedWriter writer = new BufferedWriter(new FileWriter("test.arff"));
writer.write(dataset.toString());
writer.flush();
writer.close();
I replaced SparseInstance
with Instance
, since almost all of the attribute values are non-zero. Note that in Weka 3.7 Instance
has become an interface and DenseInstance
should be used instead. Also, FastVector
has been deprecated in favour of Java's ArrayList
.
来源:https://stackoverflow.com/questions/21723013/using-a-arff-file-for-storing-data