“Got 1 columns instead of …” error in numpy

后端 未结 8 681
傲寒
傲寒 2021-01-17 13:00

I\'m working on the following code for performing Random Forest Classification on train and test sets;

from sklearn.ensemble import RandomForestClassifier
fr         


        
相关标签:
8条回答
  • 2021-01-17 13:39

    You have too many columns in one of your rows. For example

    >>> import numpy as np
    >>> from StringIO import StringIO
    >>> s = """
    ... 1 2 3 4
    ... 1 2 3 4 5
    ... """
    >>> np.genfromtxt(StringIO(s),delimiter=" ")
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/lib64/python2.6/site-packages/numpy/lib/npyio.py", line 1654, in genfromtxt
        raise ValueError(errmsg)
    ValueError: Some errors were detected !
        Line #2 (got 5 columns instead of 4)
    
    0 讨论(0)
  • 2021-01-17 13:44

    I also had this error when I was also trying to load a text dataset with genfromtext and do text classification with Keras.

    The data format was: [some_text]\t[class_label]. My understanding was that there are some characters in the 1st column that somehow confuse the parser and the two columns cannot be split properly.

    data = np.genfromtxt(my_file.csv, delimiter='\t', usecols=(0,1), dtype=str);
    

    this snippet created the same ValueError with yours and my first workaround was to read everything as one column:

    data = np.genfromtxt(my_file, delimiter='\t', usecols=(0), dtype=str);
    

    and split the data later by myself.

    However, what finally worked properly was to explicitly define the comment parameter in genfromtxt.

    data = np.genfromtxt(my_file, delimiter='\t', usecols=(0,1), dtype=str, comments=None);
    

    According to the documentation:

    The optional argument comments is used to define a character string that marks the beginning of a comment. By default, genfromtxt assumes comments='#'. The comment marker may occur anywhere on the line. Any character present after the comment marker(s) is simply ignored.

    the default character that indicates a comment is '#', and thus if this character is included in your text column, everything is ignored after it. That is probably why the two columns cannot be recognized by genfromtext.

    0 讨论(0)
提交回复
热议问题