Export python scikit learn models into pmml

♀尐吖头ヾ 提交于 2019-12-02 16:48:10

SkLearn2PMML is

a thin wrapper around the JPMML-SkLearn command-line application. For a list of supported Scikit-Learn Estimator and Transformer types, please refer to the documentation of the JPMML-SkLearn project.

As @user1808924 notes, it supports Python 2.7 or 3.4+. It also requires Java 1.7+

Installed via: (requires git)

pip install git+https://github.com/jpmml/sklearn2pmml.git

Example of how export a classifier tree to PMML. First grow the tree:

# example tree & viz from http://scikit-learn.org/stable/modules/tree.html
from sklearn import datasets, tree
iris = datasets.load_iris()
clf = tree.DecisionTreeClassifier() 
clf = clf.fit(iris.data, iris.target)

There are two parts to an SkLearn2PMML conversion, an estimator (our clf) and a mapper (for preprocessing steps such as discretization or PCA). Our mapper is pretty basic, since we are not doing any transformations.

from sklearn_pandas import DataFrameMapper
default_mapper = DataFrameMapper([(i, None) for i in iris.feature_names + ['Species']])

from sklearn2pmml import sklearn2pmml
sklearn2pmml(estimator=clf, 
             mapper=default_mapper, 
             pmml="D:/workspace/IrisClassificationTree.pmml")

It is possible (though not documented) to pass mapper=None, but you will see that the predictor names get lost (returning x1 not sepal length etc.).

Let's look at the .pmml file:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<PMML xmlns="http://www.dmg.org/PMML-4_3" version="4.3">
    <Header>
        <Application name="JPMML-SkLearn" version="1.1.1"/>
        <Timestamp>2016-09-26T19:21:43Z</Timestamp>
    </Header>
    <DataDictionary>
        <DataField name="sepal length (cm)" optype="continuous" dataType="float"/>
        <DataField name="sepal width (cm)" optype="continuous" dataType="float"/>
        <DataField name="petal length (cm)" optype="continuous" dataType="float"/>
        <DataField name="petal width (cm)" optype="continuous" dataType="float"/>
        <DataField name="Species" optype="categorical" dataType="string">
            <Value value="setosa"/>
            <Value value="versicolor"/>
            <Value value="virginica"/>
        </DataField>
    </DataDictionary>
    <TreeModel functionName="classification" splitCharacteristic="binarySplit">
        <MiningSchema>
            <MiningField name="Species" usageType="target"/>
            <MiningField name="sepal length (cm)"/>
            <MiningField name="sepal width (cm)"/>
            <MiningField name="petal length (cm)"/>
            <MiningField name="petal width (cm)"/>
        </MiningSchema>
        <Output>
            <OutputField name="probability_setosa" dataType="double" feature="probability" value="setosa"/>
            <OutputField name="probability_versicolor" dataType="double" feature="probability" value="versicolor"/>
            <OutputField name="probability_virginica" dataType="double" feature="probability" value="virginica"/>
        </Output>
        <Node id="1">
            <True/>
            <Node id="2" score="setosa" recordCount="50.0">
                <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="0.8"/>
                <ScoreDistribution value="setosa" recordCount="50.0"/>
                <ScoreDistribution value="versicolor" recordCount="0.0"/>
                <ScoreDistribution value="virginica" recordCount="0.0"/>
            </Node>
            <Node id="3">
                <SimplePredicate field="petal width (cm)" operator="greaterThan" value="0.8"/>
                <Node id="4">
                    <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.75"/>
                    <Node id="5">
                        <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.95"/>
                        <Node id="6" score="versicolor" recordCount="47.0">
                            <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.6500001"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="47.0"/>
                            <ScoreDistribution value="virginica" recordCount="0.0"/>
                        </Node>
                        <Node id="7" score="virginica" recordCount="1.0">
                            <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.6500001"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="0.0"/>
                            <ScoreDistribution value="virginica" recordCount="1.0"/>
                        </Node>
                    </Node>
                    <Node id="8">
                        <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.95"/>
                        <Node id="9" score="virginica" recordCount="3.0">
                            <SimplePredicate field="petal width (cm)" operator="lessOrEqual" value="1.55"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="0.0"/>
                            <ScoreDistribution value="virginica" recordCount="3.0"/>
                        </Node>
                        <Node id="10">
                            <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.55"/>
                            <Node id="11" score="versicolor" recordCount="2.0">
                                <SimplePredicate field="sepal length (cm)" operator="lessOrEqual" value="6.95"/>
                                <ScoreDistribution value="setosa" recordCount="0.0"/>
                                <ScoreDistribution value="versicolor" recordCount="2.0"/>
                                <ScoreDistribution value="virginica" recordCount="0.0"/>
                            </Node>
                            <Node id="12" score="virginica" recordCount="1.0">
                                <SimplePredicate field="sepal length (cm)" operator="greaterThan" value="6.95"/>
                                <ScoreDistribution value="setosa" recordCount="0.0"/>
                                <ScoreDistribution value="versicolor" recordCount="0.0"/>
                                <ScoreDistribution value="virginica" recordCount="1.0"/>
                            </Node>
                        </Node>
                    </Node>
                </Node>
                <Node id="13">
                    <SimplePredicate field="petal width (cm)" operator="greaterThan" value="1.75"/>
                    <Node id="14">
                        <SimplePredicate field="petal length (cm)" operator="lessOrEqual" value="4.8500004"/>
                        <Node id="15" score="virginica" recordCount="2.0">
                            <SimplePredicate field="sepal width (cm)" operator="lessOrEqual" value="3.1"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="0.0"/>
                            <ScoreDistribution value="virginica" recordCount="2.0"/>
                        </Node>
                        <Node id="16" score="versicolor" recordCount="1.0">
                            <SimplePredicate field="sepal width (cm)" operator="greaterThan" value="3.1"/>
                            <ScoreDistribution value="setosa" recordCount="0.0"/>
                            <ScoreDistribution value="versicolor" recordCount="1.0"/>
                            <ScoreDistribution value="virginica" recordCount="0.0"/>
                        </Node>
                    </Node>
                    <Node id="17" score="virginica" recordCount="43.0">
                        <SimplePredicate field="petal length (cm)" operator="greaterThan" value="4.8500004"/>
                        <ScoreDistribution value="setosa" recordCount="0.0"/>
                        <ScoreDistribution value="versicolor" recordCount="0.0"/>
                        <ScoreDistribution value="virginica" recordCount="43.0"/>
                    </Node>
                </Node>
            </Node>
        </Node>
    </TreeModel>
</PMML>

The first split (Node 1) is on petal width at 0.8. Node 2 (petal width <= 0.8) captures all of the setosa, with nothing else.

You can compare the pmml output to the graphviz output:

from sklearn.externals.six import StringIO
import pydotplus # this might be pydot for python 2.7
dot_data = StringIO() 
tree.export_graphviz(clf, 
                     out_file=dot_data,  
                     feature_names=iris.feature_names,  
                     class_names=iris.target_names,  
                     filled=True, rounded=True,  
                     special_characters=True) 
graph = pydotplus.graph_from_dot_data(dot_data.getvalue())
graph.write_pdf("D:/workspace/iris.pdf") 
# for in-line display, you can also do:
# from IPython.display import Image  
# Image(graph.create_png())  

Feel free to try Nyoka. Exports SKL models and then some.

Nyoka is a python library having support for Scikit-learn, XGBoost, LightGBM, Keras and Statsmodels.

Besides about 500 Python classes which each cover a PMML tag and all constructor parameters/attributes as defined in the standard, Nyoka also provides an increasing number of convenience classes and functions that make the Data Scientist’s life easier for example by reading or writing any PMML file in one line of code from within your favorite Python environment.

It can be installed from PyPi using :

pip install nyoka

Example code

Example 1

import pandas as pd
from sklearn import datasets
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler, Imputer
from sklearn_pandas import DataFrameMapper
from sklearn.ensemble import RandomForestClassifier

iris = datasets.load_iris()
irisd = pd.DataFrame(iris.data, columns=iris.feature_names)
irisd['Species'] = iris.target

features = irisd.columns.drop('Species')
target = 'Species'

pipeline_obj = Pipeline([
    ("mapping", DataFrameMapper([
    (['sepal length (cm)', 'sepal width (cm)'], StandardScaler()) , 
    (['petal length (cm)', 'petal width (cm)'], Imputer())
    ])),
    ("rfc", RandomForestClassifier(n_estimators = 100))
])

pipeline_obj.fit(irisd[features], irisd[target])

from nyoka import skl_to_pmml

skl_to_pmml(pipeline_obj, features, target, "rf_pmml.pmml")

Example 2

from keras import applications
from keras.layers import Flatten, Dense
from keras.models import Model

model = applications.MobileNet(weights='imagenet', include_top=False,input_shape = (224, 224,3))

activType='sigmoid'
x = model.output
x = Flatten()(x)
x = Dense(1024, activation="relu")(x)
predictions = Dense(2, activation=activType)(x)
model_final = Model(inputs =model.input, outputs = predictions,name='predictions')

from nyoka import KerasToPmml
cnn_pmml = KerasToPmml(model_final,dataSet='image',predictedClasses=['cats','dogs'])

cnn_pmml.export(open('2classMBNet.pmml', "w"), 0)

More examples can be found in Nyoka's Github Page .

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!