Retain feature names after Scikit Feature Selection

后端 未结 5 1383
感情败类
感情败类 2021-02-07 13:16

After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I\'m doing something simple yet stupid, but I\'d like to retain th

5条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-02-07 13:52

    As I had some problems with the function by Jarad, I have mixed it up with the solution by pteehan, which I found is more reliable. I also added NA replacement as a standard as VarianceThreshold does not like NA values.

    def variance_threshold_select(df, thresh=0.0, na_replacement=-999):
        df1 = df.copy(deep=True) # Make a deep copy of the dataframe
        selector = VarianceThreshold(thresh)
        selector.fit(df1.fillna(na_replacement)) # Fill NA values as VarianceThreshold cannot deal with those
        df2 = df.loc[:,selector.get_support(indices=False)] # Get new dataframe with columns deleted that have NA values
    
        return df2
    

提交回复
热议问题