After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I\'m doing something simple yet stupid, but I\'d like to retain th
As I had some problems with the function by Jarad, I have mixed it up with the solution by pteehan, which I found is more reliable. I also added NA replacement as a standard as VarianceThreshold does not like NA values.
def variance_threshold_select(df, thresh=0.0, na_replacement=-999):
df1 = df.copy(deep=True) # Make a deep copy of the dataframe
selector = VarianceThreshold(thresh)
selector.fit(df1.fillna(na_replacement)) # Fill NA values as VarianceThreshold cannot deal with those
df2 = df.loc[:,selector.get_support(indices=False)] # Get new dataframe with columns deleted that have NA values
return df2