Can I add outlier detection and removal to Scikit learn Pipeline?

后端 未结 1 1959
粉色の甜心
粉色の甜心 2021-02-06 10:55

I want to create a Pipeline in Scikit-Learn with a specific step being outlier detection and removal, allowing the transformed data to be passed to other transformers and estima

相关标签:
1条回答
  • 2021-02-06 11:39

    Yes. Subclass the TransformerMixin and build a custom transformer. Here is an extension to one of the existing outlier detection methods:

    from sklearn.pipeline import Pipeline, TransformerMixin
    from sklearn.neighbors import LocalOutlierFactor
    
    class OutlierExtractor(TransformerMixin):
        def __init__(self, **kwargs):
            """
            Create a transformer to remove outliers. A threshold is set for selection
            criteria, and further arguments are passed to the LocalOutlierFactor class
    
            Keyword Args:
                neg_conf_val (float): The threshold for excluding samples with a lower
                   negative outlier factor.
    
            Returns:
                object: to be used as a transformer method as part of Pipeline()
            """
    
            self.threshold = kwargs.pop('neg_conf_val', -10.0)
    
            self.kwargs = kwargs
    
        def transform(self, X, y):
            """
            Uses LocalOutlierFactor class to subselect data based on some threshold
    
            Returns:
                ndarray: subsampled data
    
            Notes:
                X should be of shape (n_samples, n_features)
            """
            X = np.asarray(X)
            y = np.asarray(y)
            lcf = LocalOutlierFactor(**self.kwargs)
            lcf.fit(X)
            return (X[lcf.negative_outlier_factor_ > self.threshold, :],
                    y[lcf.negative_outlier_factor_ > self.threshold])
    
        def fit(self, *args, **kwargs):
            return self
    

    Then create a pipeline as:

    pipe = Pipeline([('outliers', OutlierExtraction()), ...])
    
    0 讨论(0)
提交回复
热议问题