问题
I want to use AdaBoost to choose a good set features from a large number (~100k). AdaBoost works by iterating though the feature set and adding in features based on how well they preform. It chooses features that preform well on samples that were mis-classified by the existing feature set.
Im currently using in Open CV's CvBoost
. I got an example working, but from the documentation it is not clear how to pull out the feature indexes that It has used.
Using either CvBoost
, a 3rd party library or implementing it myself, how can pull out a set of features from a large feature set using AdaBoot?
回答1:
Claim: I am not a user of opencv. From the documentation, opencv's adaboost is using the decision tree (either classification tree or regression tree) as the fundamental weak learner.
It seems to me this is the way to get the underline weak learners:
CvBoost::get_weak_predictors
Returns the sequence of weak tree classifiers.
C++: CvSeq* CvBoost::get_weak_predictors()
The method returns the sequence of weak classifiers.
Each element of the sequence is a pointer to the CvBoostTree class or
to some of its derivatives.
Once you have access to the sequence of CvBoostTree*
, you should be able to inspect which features are contained in the tree and what are the split value etc.
If each tree is only a decision stump, only one feature is contained in each weak learner. But if we allow deeper depth of tree, a combination of features could exist in each individual weak learner.
I further took a look at the CvBoostTree
class; unfortunately the class itself does not provide a public method to check the internal features used. But you might want to create your own sub-class inheriting from CvBoostTree
and expose whatever functionality.
回答2:
With the help of @greeness answer I made a subclass of CvBoost
std::vector<int> RSCvBoost::getFeatureIndexes() {
CvSeqReader reader;
cvStartReadSeq( weak, &reader );
cvSetSeqReaderPos( &reader, 0 );
std::vector<int> featureIndexes;
int weak_count = weak->total;
for( int i = 0; i < weak_count; i++ ) {
CvBoostTree* wtree;
CV_READ_SEQ_ELEM( wtree, reader );
const CvDTreeNode* node = wtree->get_root();
CvDTreeSplit* split = node->split;
const int index = split->condensed_idx;
// Only add features that are not already added
if (std::find(featureIndexes.begin(),
featureIndexes.end(),
index) == featureIndexes.end()) {
featureIndexes.push_back(index);
}
}
return featureIndexes;
}
来源:https://stackoverflow.com/questions/25962349/how-do-i-use-adaboost-for-feature-selection