2.随机森林Random Forest

今天学了菜菜第二章，随机森林。顺便回顾了昨天学的决策树。

具体学到了什么总结到下面用代码和注释的形式给出，相当于给自己理清楚思路。

划分训练集和测试集的代码：

from sklearn.model_selection import train_test_split
Xtrain, Xtest, Ytrain, Ytest = train_test_split(wine.data,wine.target,test_size=0.3)
# 划分为训练集和测试集的代码

建随机森林的代码（三行）：

rfc=RandomForestClassifier(random_state=0)
rfc=rfc.fit(Xtrain,Ytrain)
score_r=rfc.score(Xtest,Ytest)
# 建随机森林来分析的代码

建决策树的代码（三行）：

clf=DecisionTreeClassifier(random_state=0)
clf=clf.fit(Xtrain,Ytrain)
score_c=clf.score(Xtest,Ytest)
# 建决策树来分析的代码

交叉验证法（也叫k折交叉验证法）的用法：

rfc_l = []

for i in range(10):
    rfc = RandomForestClassifier(n_estimators=25)
    rfc_s = cross_val_score(rfc, wine.data, wine.target, cv=10).mean() 
    rfc_l.append(rfc_s)


#     定义列表 用来存每次评测的分数

#     n_estimators是森林中树木的数量，即基评估器的数量
#     cross_val_score是交叉验证法（cross validation），

#     也叫k折交叉验证法。参数：（随机森林，数据，标签，cv）  cv：每次的测试折数
#     append用法：在列表后插入括号里面的数值

把十折交叉验证的结果画成图：

import matplotlib.pyplot as plt
# 画图用plot，在matplotlib的子库pyplot里

plt.plot(range(1,11),rfc_l,label = "Random Forest") 
plt.plot(range(1,11),clf_l,label = "Decision Tree") 
# plt画图的语句 标准写法

plt.legend()
plt.show()

随机森林的四个重要属性：

rfc.feature_importances_
# 是返回特征的重要性
# 这个属性最重要

rfc.apply(Xtest)
# apply中输入测试集 返回每个测试样本所在的叶子结点的索引

rfc.predict(Xtest)
# 输入测试集 返回每个测试样本的标签

rfc.predict_proba(Xtest)
# 这个接口返回每个测试样本对应的被分到每一类标签的概率，
# 标签有几个分类 就返回几个概率。

来源：CSDN

作者：海边的渡边彻

链接：https://blog.csdn.net/weixin_37856444/article/details/103749239

标签

随机森林

random

交叉验证

决策树