How to use Isolation Forest

前端未结

关注

 3  870

梦如初夏 2020-12-29 05:35

I am trying to detect the outliers to my dataset and I find the sklearn\'s Isolation Forest. I can\'t understand how to work with it. I fit my training data in it and it giv

3条回答

隐瞒了意图╮ (楼主)

2020-12-29 06:27
Let me add something, which I got stucked, when I read this question.

Most of the time you are using it for binary classification (I would assume), where you have a majority class 0 and an outlier class 1. For exmaple if you want to detect fraud then your major class is non-fraud (0) and fraud is (1).

Now if you have a train and test split: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

and you run:
```
clf = IsolationForest(max_samples=10000, random_state=10)
clf.fit(x_train)
y_pred_test = clf.predict(x_test)
```
The output for "normal" classifier scoring can be quite confusiong. As already mentioned the y_pred_testwill consists of [-1,1], where 1 is your majority class 0 and -1 is your minor class 1. So I can recommend you to convert it:
```
y_pred_test = np.where(y_pred_test == 1, 0, 1)
```
Then you can use your normal scoring funtions etc.
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...