问题
I'm trying to visualize my DecisionTree, but getting the error The code is:
X = [i[1:] for i in dataset]#attribute
y = [i[0] for i in dataset]
clf = tree.DecisionTreeClassifier()
dot_data = StringIO()
tree.export_graphviz(clf.fit(train_X, train_y), out_file=dot_data)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("tree.pdf")
And the error is
Traceback (most recent call last):
if data.startswith(codecs.BOM_UTF8):
TypeError: startswith first arg must be str or a tuple of str, not bytes
Can anyone explain me whats the problem? Thank you a lot!
回答1:
I had the same exact problem and just spent a couple hours trying to figure this out. I can't guarantee what I share here will work for others but it may be worth a shot.
- I tried installing official
pydot
packages but I have Python 3 and they simply did not work. After finding a note in a thread from one of the many websites I scoured through, I ended up installing this forked repository of pydot. - I went to graphviz.org and installed their software on my Windows 7 machine. If you don't have Windows, look under their Download section for your system.
- After successful install, in Environment Variables (
Control Panel\All Control Panel Items\System\Advanced system settings
> clickEnvironment Variables
button > underSystem variables
I found the variablepath
> clickEdit...
> I added;C:\Program Files (x86)\Graphviz2.38\bin
to the end in theVariable value:
field. - To confirm I can now use
dot
commands in the Command Line (Windows Command Processor), I typeddot -V
which returneddot - graphviz version 2.38.0 (20140413.2041)
.
In the below code, keep in mind that I'm reading a dataframe
from my clipboard. You might be reading it from file or whathaveyou.
In IPython Notebook:
import pandas as pd
import numpy as np
from sklearn import tree
import pydot
from IPython.display import Image
from sklearn.externals.six import StringIO
df = pd.read_clipboard()
X = df[df.columns[:-1]]
y = df[df.columns[-1]]
dtr = tree.DecisionTreeRegressor(max_depth=3)
dtr.fit(X, y)
dot_data = StringIO()
tree.export_graphviz(dtr, out_file=dot_data, feature_names=X.columns)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png())
Alternatively, if you're not using IPython, you can generate your own image from the command line as long as you have graphviz installed (step 2 above). Using my same example code above, you use this line after fitting the model:
tree.export_graphviz(dtr.tree_, out_file='treepic.dot', feature_names=X.columns)
then open up command prompt where the treepic.dot
file is and enter this command line:
dot -T png treepic.dot -o treepic.png
A .png file should be created with your decision tree.
回答2:
In case of using Python 3, just use pydotplus instead of pydot. It will also have a soft installation process by pip.
import pydotplus
<your code>
dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data)
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("iris.pdf")
回答3:
The line in question is checking to see if the stream/file is encoded as UTF-8
Instead of:
if data.startswith(codecs.BOM_UTF8):
use:
if codecs.BOM_UTF8 in data:
You will likely have more success...
来源:https://stackoverflow.com/questions/31209016/python-pydot-and-decisiontree