I\'m working on the following code for performing Random Forest Classification on train and test sets;
from sklearn.ensemble import RandomForestClassifier
fr
You have too many columns in one of your rows. For example
>>> import numpy as np
>>> from StringIO import StringIO
>>> s = """
... 1 2 3 4
... 1 2 3 4 5
... """
>>> np.genfromtxt(StringIO(s),delimiter=" ")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.6/site-packages/numpy/lib/npyio.py", line 1654, in genfromtxt
raise ValueError(errmsg)
ValueError: Some errors were detected !
Line #2 (got 5 columns instead of 4)
I also had this error when I was also trying to load a text dataset with genfromtext and do text classification with Keras.
The data format was: [some_text]\t[class_label]
.
My understanding was that there are some characters in the 1st column that somehow confuse the parser and the two columns cannot be split properly.
data = np.genfromtxt(my_file.csv, delimiter='\t', usecols=(0,1), dtype=str);
this snippet created the same ValueError with yours and my first workaround was to read everything as one column:
data = np.genfromtxt(my_file, delimiter='\t', usecols=(0), dtype=str);
and split the data later by myself.
However, what finally worked properly was to explicitly define the comment parameter in genfromtxt.
data = np.genfromtxt(my_file, delimiter='\t', usecols=(0,1), dtype=str, comments=None);
According to the documentation:
The optional argument comments is used to define a character string that marks the beginning of a comment. By default, genfromtxt assumes comments='#'. The comment marker may occur anywhere on the line. Any character present after the comment marker(s) is simply ignored.
the default character that indicates a comment is '#', and thus if this character is included in your text column, everything is ignored after it. That is probably why the two columns cannot be recognized by genfromtext.