Scalable or online out-of-core multi-label classifiers

前端 未结 4 1868
無奈伤痛
無奈伤痛 2021-02-04 08:39

I have been blowing my brains out over the past 2-3 weeks on this problem. I have a multi-label (not multi-class) problem where each sample can belong to several of the labels.<

4条回答
  •  [愿得一人]
    2021-02-04 09:10

    My argument for scalability is that instead of using OneVsRest which is just a simplest of simplest baselines, you should use a more advanced ensemble of problem-transformation methods. In my paper I provide a scheme for dividing label space into subspaces and transforming the subproblems into multi-class single-label classifications using Label Powerset. To try this, just use the following code that utilizes a multi-label library built on top of scikit-learn - scikit-multilearn:

    from skmultilearn.ensemble import LabelSpacePartitioningClassifier
    from skmultilearn.cluster import IGraphLabelCooccurenceClusterer
    from skmultilearn.problem_transform import LabelPowerset
    
    from sklearn.linear_model import SGDClassifier
    
    # base multi-class classifier SGD
    base_classifier = SGDClassifier(loss='log', penalty='l2', n_jobs=-1)
    
    # problem transformation from multi-label to single-label multi-class
    transformation_classifier = LabelPowerset(base_classifier)
    
    # clusterer dividing the label space using fast greedy modularity maximizing scheme
    clusterer = IGraphLabelCooccurenceClusterer('fastgreedy', weighted=True, include_self_edges=True) 
    
    # ensemble
    clf = LabelSpacePartitioningClassifier(transformation_classifier, clusterer)
    
    clf.fit(x_train, y_train)
    prediction = clf.predict(x_test)
    

提交回复
热议问题