I am trying to detect the outliers to my dataset and I find the sklearn\'s Isolation Forest. I can\'t understand how to work with it. I fit my training data in it and it giv
Let me add something, which I got stucked, when I read this question.
Most of the time you are using it for binary classification (I would assume), where you have a majority class 0 and an outlier class 1. For exmaple if you want to detect fraud then your major class is non-fraud (0) and fraud is (1).
Now if you have a train and test split: X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)
and you run:
clf = IsolationForest(max_samples=10000, random_state=10)
clf.fit(x_train)
y_pred_test = clf.predict(x_test)
The output for "normal" classifier scoring can be quite confusiong. As already mentioned the y_pred_test
will consists of [-1,1], where 1 is your majority class 0 and -1 is your minor class 1. So I can recommend you to convert it:
y_pred_test = np.where(y_pred_test == 1, 0, 1)
Then you can use your normal scoring funtions etc.