问题
I need to be able to change the features (with the machine learning meaning) that are used to build the decision tree. Given the example of the Iris Dataset, I want to be able to select the Sepallength as the feature used in the root node and the Petallength as a feature used in the nodes of the first level, and so on.
I want to be clear, my aim is not to change the minimum sample split and the random state of the decision tree. But rather to select the features - the characteristics of the elements that are classified - and put them in some nodes of the decision tree.
The code should then be able to find the best threshold - range for each node - to generate the best split.
Here some general code about the tree generation.
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
clf.fit(iris.data,iris.target)
Does any of you have ever done this?
回答1:
Does any of you have ever done this?
No, you are probably the first one!
Haha, but you can select it in several ways, you can also find it in the offical documentation: https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
then you are doing: clf.fit(X, y)
Ohter ways to do it are explained here: Selecting multiple columns in a pandas dataframe
来源:https://stackoverflow.com/questions/58233148/how-to-manually-select-the-features-of-the-decision-tree