Retain feature names after Scikit Feature Selection

后端 未结 5 1386
感情败类
感情败类 2021-02-07 13:16

After running a Variance Threshold from Scikit-Learn on a set of data, it removes a couple of features. I feel I\'m doing something simple yet stupid, but I\'d like to retain th

5条回答
  •  情书的邮戳
    2021-02-07 13:51

    Would something like this help? If you pass it a pandas dataframe, it will get the columns and use get_support like you mentioned to iterate over the columns list by their indices to pull out only the column headers that met the variance threshold.

    >>> df
       Survived  Pclass  Sex  Age  SibSp  Parch  Nonsense
    0         0       3    1   22      1      0         0
    1         1       1    2   38      1      0         0
    2         1       3    2   26      0      0         0
    
    >>> from sklearn.feature_selection import VarianceThreshold
    >>> def variance_threshold_selector(data, threshold=0.5):
        selector = VarianceThreshold(threshold)
        selector.fit(data)
        return data[data.columns[selector.get_support(indices=True)]]
    
    >>> variance_threshold_selector(df, 0.5)
       Pclass  Age
    0       3   22
    1       1   38
    2       3   26
    >>> variance_threshold_selector(df, 0.9)
       Age
    0   22
    1   38
    2   26
    >>> variance_threshold_selector(df, 0.1)
       Survived  Pclass  Sex  Age  SibSp
    0         0       3    1   22      1
    1         1       1    2   38      1
    2         1       3    2   26      0
    

提交回复
热议问题