How to specify the prior probability for scikit-learn's Naive Bayes

前端 未结 2 1369
盖世英雄少女心
盖世英雄少女心 2020-12-08 16:04

I\'m using the scikit-learn machine learning library (Python) for a machine learning project. One of the algorithms I\'m using is the Gaussian Naive Bayes implementation. On

相关标签:
2条回答
  • 2020-12-08 16:19

    @Jianxun Li: there is in fact a way to set prior probabilities in GaussianNB. It's called 'priors' and its available as a parameter. See documentation: "Parameters: priors : array-like, shape (n_classes,) Prior probabilities of the classes. If specified the priors are not adjusted according to the data." So let me give you an example:

    from sklearn.naive_bayes import GaussianNB
    # minimal dataset
    X = [[1, 0], [1, 0], [0, 1]]
    y = [0, 0, 1]
    # use empirical prior, learned from y
    mn = GaussianNB()
    print mn.fit(X,y).predict([1,1])
    print mn.class_prior_
    
    >>>[0]
    >>>[ 0.66666667  0.33333333]
    

    But if you changed the prior probabilities, it will give a different answer which is what you are looking for I believe.

    # use custom prior to make 1 more likely
    mn = GaussianNB(priors=[0.1, 0.9])
    mn.fit(X,y).predict([1,1])
    >>>>array([1])
    
    0 讨论(0)
  • 2020-12-08 16:32

    The GaussianNB() implemented in scikit-learn does not allow you to set class prior. If you read the online documentation, you see .class_prior_ is an attribute rather than parameters. Once you fit the GaussianNB(), you can get access to class_prior_ attribute. It is calculated by simply counting the number of different labels in your training sample.

    from sklearn.datasets import make_classification
    from sklearn.naive_bayes import GaussianNB
    
    
    # simulate data with unbalanced weights
    X, y = make_classification(n_samples=1000, weights=[0.1, 0.9])
    # your GNB estimator
    gnb = GaussianNB()
    gnb.fit(X, y)
    
    gnb.class_prior_
    Out[168]: array([ 0.105,  0.895])
    
    gnb.get_params()
    Out[169]: {}
    

    You see the estimator is smart enough to take into account the unbalanced weight issue. So you don't have to manually specify the priors.

    0 讨论(0)
提交回复
热议问题