I\'m working on the following code for performing Random Forest Classification on train and test sets;
from sklearn.ensemble import RandomForestClassifier
fr
An exception is raised if an inconsistency is detected in the number of columns.A number of reasons and solutions are possible.
Add invalid_raise = False
to skip the offending lines.
dataset = genfromtxt(open('data.csv','r'), delimiter='', invalid_raise = False)
If your data contains Names, make sure that the field name doesn’t contain any space or invalid character, or that it does not correspond to the name of a standard attribute (like size or shape), which would confuse the interpreter.
deletechars
Gives a string combining all the characters that must be deleted from the name. By default, invalid characters are
~!@#$%^&*()-=+~\|]}[{';: /?.>,<.
excludelist
Gives a list of the names to exclude, such as
return, file, print…
If one of the input name is part of this list, an underscore character ('_') will be appended to it.
case_sensitive
Whether the names should be case-sensitive (
case_sensitive=True
), converted to upper case (case_sensitive=False
orcase_sensitive='upper'
) or to lower case (case_sensitive='lower'
).
data = np.genfromtxt("data.txt", dtype=None, names=True,\
deletechars="~!@#$%^&*()-=+~\|]}[{';: /?.>,<.", case_sensitive=True)
Reference: numpy.genfromtxt