How to use Isolation Forest

前端 未结 3 870
梦如初夏
梦如初夏 2020-12-29 05:35

I am trying to detect the outliers to my dataset and I find the sklearn\'s Isolation Forest. I can\'t understand how to work with it. I fit my training data in it and it giv

3条回答
  •  隐瞒了意图╮
    2020-12-29 06:27

    Let me add something, which I got stucked, when I read this question.

    Most of the time you are using it for binary classification (I would assume), where you have a majority class 0 and an outlier class 1. For exmaple if you want to detect fraud then your major class is non-fraud (0) and fraud is (1).

    Now if you have a train and test split: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

    and you run:

    clf = IsolationForest(max_samples=10000, random_state=10)
    clf.fit(x_train)
    y_pred_test = clf.predict(x_test)
    

    The output for "normal" classifier scoring can be quite confusiong. As already mentioned the y_pred_testwill consists of [-1,1], where 1 is your majority class 0 and -1 is your minor class 1. So I can recommend you to convert it:

    y_pred_test = np.where(y_pred_test == 1, 0, 1)
    

    Then you can use your normal scoring funtions etc.

提交回复
热议问题