I\'m testing out pybrain following the basic classification tutorial here and a different take on it with some more realistic data here. However I receive this error when applyi
The implementation of splitWithProportion
changed between PyBrain versions 0.3.2 and 0.3.3., introducing this bug that breaks polymorphism.
As of now, the library hasn't been updated since January 2015, so using some kind of workaround is the only course of action at the moment.
You can check the responsible commit here: https://github.com/pybrain/pybrain/commit/2f02b8d9e4e9d6edbc135a355ab387048a00f1af
I have the same issue and think I fixed it: See this pull request.
(Python 2.7.6, PyBrain 0.3.3, OS X 10.9.5)
I had the same problem. I added the following code to make it work on my machine.
tstdata_temp, trndata_temp = alldata.splitWithProportion(0.25)
tstdata = ClassificationDataSet(2, 1, nb_classes=3)
for n in xrange(0, tstdata_temp.getLength()):
tstdata.addSample( tstdata_temp.getSample(n)[0], tstdata_temp.getSample(n)[1] )
trndata = ClassificationDataSet(2, 1, nb_classes=3)
for n in xrange(0, trndata_temp.getLength()):
trndata.addSample( trndata_temp.getSample(n)[0], trndata_temp.getSample(n)[1] )
This converts tstdata
and trndata
back to the ClassificationDataSet
type.
I tried the suggested workaround from Muhammed Miah, but I still was tripped up when running the tutorial at the line:
print( trndata['input'][0], trndata['target'][0], trndata['class'][0])
trndata['class'] was an empty array, so index [0] threw a fault.
I was able to workaround by making my own function ConvertToOneOfMany:
def ConvertToOneOfMany(d,nb_classes,bounds=(0,1)):
d2 = ClassificationDataSet(d.indim, d.outdim, nb_classes=nb_classes)
for n in range(d.getLength()):
d2.addSample( d.getSample(n)[0], d.getSample(n)[1] )
oldtarg=d.getField('target')
newtarg=np.zeros([len(d),nb_classes],dtype='Int32')+bounds[0]
for i in range(len(d)):
newtarg[i,int(oldtarg[i])]=bounds[1]
d2.setField('class',oldtarg)
d2.setField('target',newtarg)
return(d2)
The simplest workaround that I found was to do first the splitWithProportion(), update the number of classes and then do the _convertToOneOfMany().
tstdata, trndata = alldata.splitWithProportion( 0.25 )
tstdata.nClasses = alldata.nClasses
trndata.nClasses = alldata.nClasses
tstdata._convertToOneOfMany(bounds=[0, 1])
trndata._convertToOneOfMany(bounds=[0, 1])
And with the update of nClasses of both testdata and trndata, it is guarantee that you don't get different dimensions in the target fields.
I was geting errors either if I did first _convertToOneOfMany and second splitWithProportion or the other way around when working with a ClassificationDataSet. So, I suggested and update in the splitWithProportion function. You can see the whole code in this pullRequest.
So, I did the following without getting an error:
from pybrain.datasets import ClassificationDataSet
ds = ClassificationDataSet(4096, 1 , nb_classes=40)
for k in range(400):
ds.addSample(k,k%4)
print(type(ds))
# <class 'pybrain.datasets.classification.ClassificationDataSet'>
tstdata, trndata = ds.splitWithProportion(0.25)
print(type(trndata))
# <class 'pybrain.datasets.classification.ClassificationDataSet'>
print(type(tstdata))
# <class 'pybrain.datasets.classification.ClassificationDataSet'>
trndata._convertToOneOfMany()
tstdata._convertToOneOfMany()
The only difference I see between my code and yours is your use of X. Perhaps you can confirm that my code works on your machine, and if so then we could look into what about X if confusing things?