Apologies as I'm a complete novice when it comes to Weka.
I have 100 instances and each instance has 400 attributes most of which have a single value. However some attributes have multiple values as they contain a time component. I was wondering if Weka can analyse multiple values for one attribute and if so, how do I separate these values so that weka can read them (e.g. commas, semi-colons?)
Many Thanks for your help
Weka natively works with a format called arff acronym for Attribute-Relation File Format. This format consists of a clearly differentiated structure in three parts:
1.Head. Here, the name of the relationship is defined. Its format is as follows:
relation <name-of-the-relationship>
Where is of type String. If this name contains some space will be put between quotation marks.
2. Statements of attributes. This section describes the attributes that make up our file with his type are declared. The syntax is:
attribute <attribute-name> <type>
Where it is of type String having the same restrictions as above.
Weka accepts various types, these are:
a) NUMERIC. Real numbers*
c) DATE. Dates, to do this kind should be preceded by a label quoted format. The label format is composed of separator characters (hyphens and / or spaces) and time units: dd Day. MM Month. yyyy Year. HH Hours. mm minutes. ss seconds.
d) STRING.. With the restrictions of the type String commented previously.
e) LISTED The identifier of this type is to express in braces and separated Comma possible values (or character strings) that can take attribute. For example, if we have an attribute that indicates the time could be defined:
attribute time {sunny, rainy, cloudy}
3. Data Section. Declare the data that make up the relationship between commas separating the attributes and line breaks relationships.
Although this is the "full" mode it is possible to define the data in a short form (sparse data). If we have a sample in which there are many data we can express 0 Data, omitting those items that are zero, surrounding each of the rows in braces and placing in front of each of the data the attribute number.
An example of this is as follows:
{14 1, 3 3}
In the event that any of the information is unknown is expressed with a symbol of close interrogation ("?"). And if you want to add comments, use the character %.
So, you can use several values to contruct your dataset.
1 % Test Weka.
2 @relation MyTest
4 @attribute nombre STRING
5 @attribute ojo_izquierdo {Bien,Mal}
6 @attribute dimension NUMERIC
7 @attribute fecha_analisis DATE "dd-MM-yyyy HH:mm"
9 @data
10 Antonio,Bien,38.43,"12-04-2003 12:23"
11 ’Maria Jose’,?,34.53,"14-05-2003 13:45"
12 Juan,Bien,43,"01-01-2004 08:04"
13 Maria,?,?,"03-04-2003 11:03"