Sklearn StratifiedKFold: ValueError: Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead

前端 未结 4 2135
攒了一身酷
攒了一身酷 2020-12-17 08:36

Working with Sklearn stratified kfold split, and when I attempt to split using multi-class, I received on error (see below). When I tried and split using binary, it works n

相关标签:
4条回答
  • 2020-12-17 08:48

    keras.utils.to_categorical produces a one-hot encoded class vector, i.e. the multilabel-indicator mentioned in the error message. StratifiedKFold is not designed to work with such input; from the split method docs:

    split(X, y, groups=None)

    [...]

    y : array-like, shape (n_samples,)

    The target variable for supervised learning problems. Stratification is done based on the y labels.

    i.e. your y must be a 1-D array of your class labels.

    Essentially, what you have to do is simply to invert the order of the operations: split first (using your intial y_train), and convert to_categorical afterwards.

    0 讨论(0)
  • 2020-12-17 08:55

    In my case, x was a 2D matrix, and y was also a 2d matrix, i.e. indeed a multi-class multi-output case. I just passed a dummy np.zeros(shape=(n,1)) for the y and the x as usual. Full code example:

    import numpy as np
    from sklearn.model_selection import RepeatedStratifiedKFold
    X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [3, 7], [9, 4]])
    # y = np.array([0, 0, 1, 1, 0, 1]) # <<< works
    y = X # does not work if passed into `.split`
    rskf = RepeatedStratifiedKFold(n_splits=3, n_repeats=3, random_state=36851234)
    for train_index, test_index in rskf.split(X, np.zeros(shape=(X.shape[0], 1))):
        print("TRAIN:", train_index, "TEST:", test_index)
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
    
    0 讨论(0)
  • 2020-12-17 08:58

    Call to split() like this:

    for i, (train_index, val_index) in enumerate(kf.split(x_train, y_train_categorical.argmax(1))):
        x_train_kf, x_val_kf = x_train[train_index], x_train[val_index]
        y_train_kf, y_val_kf = y_train[train_index], y_train[val_index]
    
    0 讨论(0)
  • 2020-12-17 09:08

    I bumped into the same problem and found out that you can check the type of the target with this util function:

    from sklearn.utils.multiclass import type_of_target
    type_of_target(y)
    
    'multilabel-indicator'
    

    From its docstring:

    • 'binary': y contains <= 2 discrete values and is 1d or a column vector.
    • 'multiclass': y contains more than two discrete values, is not a sequence of sequences, and is 1d or a column vector.
    • 'multiclass-multioutput': y is a 2d array that contains more than two discrete values, is not a sequence of sequences, and both dimensions are of size > 1.
    • 'multilabel-indicator': y is a label indicator matrix, an array of two dimensions with at least two columns, and at most 2 unique values.

    With LabelEncoder you can transform your classes into an 1d array of numbers (given your target labels are in an 1d array of categoricals/object):

    from sklearn.preprocessing import LabelEncoder
    
    label_encoder = LabelEncoder()
    y = label_encoder.fit_transform(target_labels)
    
    0 讨论(0)
提交回复
热议问题