How do I use AdaBoost for feature selection?

吃可爱长大的小学妹 提交于 2019-12-11 03:26:25

问题


I want to use AdaBoost to choose a good set features from a large number (~100k). AdaBoost works by iterating though the feature set and adding in features based on how well they preform. It chooses features that preform well on samples that were mis-classified by the existing feature set.

Im currently using in Open CV's CvBoost. I got an example working, but from the documentation it is not clear how to pull out the feature indexes that It has used.

Using either CvBoost, a 3rd party library or implementing it myself, how can pull out a set of features from a large feature set using AdaBoot?


回答1:


Claim: I am not a user of opencv. From the documentation, opencv's adaboost is using the decision tree (either classification tree or regression tree) as the fundamental weak learner.

It seems to me this is the way to get the underline weak learners:

CvBoost::get_weak_predictors
Returns the sequence of weak tree classifiers.

C++: CvSeq* CvBoost::get_weak_predictors()
The method returns the sequence of weak classifiers. 
Each element of the sequence is a pointer to the CvBoostTree class or 
to some of its derivatives.

Once you have access to the sequence of CvBoostTree*, you should be able to inspect which features are contained in the tree and what are the split value etc.

If each tree is only a decision stump, only one feature is contained in each weak learner. But if we allow deeper depth of tree, a combination of features could exist in each individual weak learner.

I further took a look at the CvBoostTree class; unfortunately the class itself does not provide a public method to check the internal features used. But you might want to create your own sub-class inheriting from CvBoostTree and expose whatever functionality.




回答2:


With the help of @greeness answer I made a subclass of CvBoost

std::vector<int> RSCvBoost::getFeatureIndexes() {

    CvSeqReader reader;
    cvStartReadSeq( weak, &reader );
    cvSetSeqReaderPos( &reader, 0 );

    std::vector<int> featureIndexes;

    int weak_count = weak->total;
    for( int i = 0; i < weak_count; i++ ) {
        CvBoostTree* wtree;
        CV_READ_SEQ_ELEM( wtree, reader );

        const CvDTreeNode* node = wtree->get_root();
        CvDTreeSplit* split = node->split;
        const int index = split->condensed_idx;

        // Only add features that are not already added
        if (std::find(featureIndexes.begin(),
                      featureIndexes.end(),
                      index) == featureIndexes.end()) {

            featureIndexes.push_back(index);
        }

    }

    return featureIndexes;
}


来源:https://stackoverflow.com/questions/25962349/how-do-i-use-adaboost-for-feature-selection

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!