Feature Selection in MATLAB

前端 未结 3 2128
天涯浪人
天涯浪人 2020-12-25 10:17

I have a dataset for text classification ready to be used in MATLAB. Each document is a vector in this dataset and the dimensionality of this vector is extremely high. In th

相关标签:
3条回答
  • 2020-12-25 10:54

    Feature selection depends on the specific task you want to do on the text data.

    One of the simplest and crudest method is to use Principal component analysis (PCA) to reduce the dimensions of the data. This reduced dimensional data can be used directly as features for classification.

    See the tutorial on using PCA here:

    http://matlabdatamining.blogspot.com/2010/02/principal-components-analysis.html

    Here is the link to Matlab PCA command help:

    http://www.mathworks.com/help/toolbox/stats/princomp.html

    Using the obtained features, the well known Support Vector Machines (SVM) can be used for classification.

    http://www.mathworks.com/help/toolbox/bioinfo/ref/svmclassify.html http://www.autonlab.org/tutorials/svm.html

    0 讨论(0)
  • 2020-12-25 11:00

    You might consider using the independent features technique of Weiss and Kulikowski to quickly eliminate variables which are obviously unimformative:

    http://matlabdatamining.blogspot.com/2006/12/feature-selection-phase-1-eliminate.html

    0 讨论(0)
  • 2020-12-25 11:03

    MATLAB (and its toolboxes) include a number of functions that deal with feature selection:

    • RANDFEATURES (Bioinformatics Toolbox): Generate randomized subset of features directed by a classifier
    • RANKFEATURES (Bioinformatics Toolbox): Rank features by class separability criteria
    • SEQUENTIALFS (Statistics Toolbox): Sequential feature selection
    • RELIEFF (Statistics Toolbox): Relief-F algorithm
    • TREEBAGGER.OOBPermutedVarDeltaError, predictorImportance (Statistics Toolbox): Using ensemble methods (bagged decision trees)

    You can also find examples that demonstrates usage on real datasets:

    • Identifying Significant Features and Classifying Protein Profiles
    • Genetic Algorithm Search for Features in Mass Spectrometry Data

    In addition, there exist third-party toolboxes:

    • Matlab Toolbox for Dimensionality Reduction
    • LIBGS: A MATLAB Package for Gene Selection

    Otherwise you can always call your favorite functions from WEKA directly from MATLAB since it include a JVM...

    0 讨论(0)
提交回复
热议问题