Retain feature names after Scikit Feature Selection

后端未结

关注

 5  1406

感情败类 2021-02-07 13:16

After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I\'m doing something simple yet stupid, but I\'d like to retain th

5条回答

小蘑菇 (楼主)

2021-02-07 13:52

As I had some problems with the function by Jarad, I have mixed it up with the solution by pteehan, which I found is more reliable. I also added NA replacement as a standard as VarianceThreshold does not like NA values.

def variance_threshold_select(df, thresh=0.0, na_replacement=-999):
    df1 = df.copy(deep=True) # Make a deep copy of the dataframe
    selector = VarianceThreshold(thresh)
    selector.fit(df1.fillna(na_replacement)) # Fill NA values as VarianceThreshold cannot deal with those
    df2 = df.loc[:,selector.get_support(indices=False)] # Get new dataframe with columns deleted that have NA values

    return df2

0 讨论(0)

查看其它5个回答