I am new to python,PMML and augustus,so this question kind of newbie.I have a PMML file from which i want to score after every new iteration of data. I have to use Python with Augustus only to complete this excercise. I have read various articles some of them worth mentioning as they are good.
(http://augustusdocs.appspot.com/docs/v06/model_abstraction/augustus_and_pmml.html , http://augustus.googlecode.com/svn-history/r191/trunk/augustus/modellib/regression/producer/Producer.py)
I have read augustus documentation relevent to scoring to understand how it works,but i am unable to solve this problem.
A sample PMML file is generated using cars data in R. where "dist" is dependent and "speed" is independent variable. Now i want to predict dist everytime whenever i recieve data for speed from the equation (which is dist = -17.5790948905109 + speed*3.93240875912408) . I know it can be easily done in R with predict function,but the problem is i don't have R at backend and only python is there with augustus to score. Any help is much appreciated and thanks in advance.
Sample PMML file:
<?xml version="1.0"?>
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_1 http://www.dmg.org/v4-1/pmml-4-1.xsd">
<Header copyright="Copyright (c) 2013 user" description="Linear Regression Model">
<Extension name="user" value="user" extender="Rattle/PMML"/>
<Application name="Rattle/PMML" version="1.4"/>
<Timestamp>2013-11-07 09:24:06</Timestamp>
</Header>
<DataDictionary numberOfFields="2">
<DataField name="dist" optype="continuous" dataType="double"/>
<DataField name="speed" optype="continuous" dataType="double"/>
</DataDictionary>
<RegressionModel modelName="Linear_Regression_Model" functionName="regression" algorithmName="least squares">
<MiningSchema>
<MiningField name="dist" usageType="predicted"/>
<MiningField name="speed" usageType="active"/>
</MiningSchema>
<Output>
<OutputField name="Predicted_dist" feature="predictedValue"/>
</Output>
<RegressionTable intercept="-17.5790948905109">
<NumericPredictor name="speed" exponent="1" coefficient="3.93240875912408"/>
</RegressionTable>
</RegressionModel>
</PMML>
You could use PyPMML to score the PMML model in Python, for example:
from pypmml import Model
model = Model.fromString('''<?xml version="1.0"?>
<PMML version="4.1" xmlns="http://www.dmg.org/PMML-4_1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.dmg.org/PMML-4_1 http://www.dmg.org/v4-1/pmml-4-1.xsd">
<Header copyright="Copyright (c) 2013 user" description="Linear Regression Model">
<Extension name="user" value="user" extender="Rattle/PMML"/>
<Application name="Rattle/PMML" version="1.4"/>
<Timestamp>2013-11-07 09:24:06</Timestamp>
</Header>
<DataDictionary numberOfFields="2">
<DataField name="dist" optype="continuous" dataType="double"/>
<DataField name="speed" optype="continuous" dataType="double"/>
</DataDictionary>
<RegressionModel modelName="Linear_Regression_Model" functionName="regression" algorithmName="least squares">
<MiningSchema>
<MiningField name="dist" usageType="predicted"/>
<MiningField name="speed" usageType="active"/>
</MiningSchema>
<Output>
<OutputField name="Predicted_dist" feature="predictedValue"/>
</Output>
<RegressionTable intercept="-17.5790948905109">
<NumericPredictor name="speed" exponent="1" coefficient="3.93240875912408"/>
</RegressionTable>
</RegressionModel>
</PMML>''')
result = model.predict({'speed': 1.0})
The result is a dict with Predicted_dist:
{'Predicted_dist': -13.646686131386819}
来源:https://stackoverflow.com/questions/19997433/how-to-score-a-linear-model-using-pmml-file-and-augustus-on-python