## Learning about the Predictive Model Markup Language (PMML)

Predictive modelling is the branch of data science concerned with the prediction of future probabilities and trends. Nowadays, the use of these techniques is undergoing an impressive growth due to their great potential in many different fieds, such as business intelligence, health care or industry 4.0. Indeed, there are many organizations that are adopting predictive modelling into their daily operations to improve decision making.

There are many tools available in the predictive analytics market, which are suited to different purposes. However, many of these applications use their own representations for the models, and don't allow to export them to other software tools. This is one of the problems that the data science community is working to solve.

The Predictive Model Mark-up Language (PMML) is an XML-based format for interchaning predictive models among different applications. It is being developed by the Data Mining Group (DMG), a consortium of commercial and open-source data mining companies. PMML provides an easy way for deploying predictive models which were created with a different application.

Currently, the most popular software tools for predictive analytics and data mining contain utilities to import and/or export PMML models. This allows data scientists to exchange their predictive models in among different software tools.

In order to illustrate how a PMML document works, we will consider a simple data set to be approximated with a neural network. The following table contains the data to be used in this example. In this case there are two variables, x and y, and the relationship between both is quadratic.

x

y
0
0
1
1
2
4
3
9
4
16
5
25

The next figure illustrates the predictive model to be used here, which is a neural network which has been trained with the quadratic data set. It has one input, three hidden neurons and one output neuron. This neural network also includes an scaling and an uscaling layers.

The mathematical expression represented by this particular model is written below. It takes the input x to produce the output y. The information is propagated in a feed-forward fashion through the scaling layer, the learning layers and the unscaling layer.

```	scaled_x=2*(x-0)/(5-0)-1;
y_1_1=tanh(-0.466+0.934*scaled_x);
y_1_2=tanh(-1.084+1.116*scaled_x);
y_1_3=tanh(-0.169+0.143*scaled_x);
scaled_y=0.713+0.583*y_1_1+1.162*y_1_2+0.261*y_1_3;
y = 0.5*(scaled_y+1)*(25-0)+0;
```

Once we have the predictive model, we are willing to create the PMML document for it. The first step is to know the structure of a PMML file, which is depicted next.

```	<?xml version="1.0" encoding="UTF-8"?>
<PMML
version=
"4.2"
xmlns=
"http://www.dmg.org/PMML-4_2"
xmlns:xsi=
"http://www.w3.org/2001/XMLSchema-instance"
>
<TransformationDictionary/>
<NeuralNetwork/>
</PMML>
```

The first line is an XML declaration, i.e., a processing instruction that identifies the document as being XML. The PMML tree starts with the root element called PMML. This element has some attributes, such as the PMML version. The main branch elements are the header, the data dictionary, the transformation dictionary and the predictive model. All of them are explained next.

The header gives information about who created the PMML document and which software he/she used. It has the following format.

```	<Header copyright="MyCompany">
<Application name="MyApplication"/>
```

In this case, the copyright owner of this predictive model is MyCompany, which has used the tool MyApplication to create it.

##### Data dictionary

The data dictionary gives information about the variables in the dataset: their name, type and range. For the quadratic example, the data dictionary tag is listed below.

```	<DataDictionary numberOfFields="2">
<DataField dataType="double" name="x" optype="continuous">
<Interval closure="closedClosed" leftMargin="0" rightMargin="5"/>
</DataField>
<DataField dataType="double" name="y" optype="continuous">
<Interval closure="closedClosed" leftMargin="0" rightMargin="25"/>
</DataField>
```

In this case, we have two fields, "x" and "y". Both of them are continuous values. "x" ranges from 0 to 5 and "y" ranges from 0 to 25.

##### Transformation dictionary

When building a predictive model it is, in general, necessary to perform some transformations on the data. In this regard, data pre-processing describes any type of computation performed on the input data to prepare it for another procedure. For example, we can scale the data to have minimum -1 and maximum +1. On the other hand, data-postprocessing refers to computations performed in order to obtain the final outputs. For instance, scaled outputs with minimum -1 and maximum +1 can be transformed into the original values using an unscaling procedure. That kind of transformations are illustrated in the next figure.

In this regard, the transformation dictionary tag of PMML includes the pre and post-processing actions to be applied within the predictive model. PMML defines various kinds of transformations: normalization, discretization, value mapping, text indexing, functions and aggregation. The transformation dictionary element for our example is listed below.

```	<TransformationDictionary>
<DerivedField displayName="x" dataType="double" optype="continuous">
<NormContinuous field="x">
<LinearNorm orig="0.0" norm="-1"/>
<LinearNorm orig="1.0" norm="-0.6"/>
</NormContinuous>
</DerivedField>
<DerivedField displayName="y" dataType="double" optype="continuous">
<NormContinuous field="y">
<LinearNorm norm="0.0" orig="12.5"/>
<LinearNorm norm="1.0" orig="25"/>
</NormContinuous>
</DerivedField>
</TransformationDictionary>
```

As we can see, we have represented the scaling and unscaling layers by means of normalization. More specifically, we have used the NormContinuous function, which normalizes a given field by linear interpolation. For the variable "x", the two points used for the interpolation are (0,-1) and (1,-0.6). This transformation scales "x" values in the range [0,5] to fall in [-1,1]. For the variable "y", the two points used are (0,12.5) and (1,25). This transformation unscales "y" values in the range [-1,1] to fall in [0,25].

##### Predictive model

The predictive model tag represents the predictive model itself. PMML supports many different model types. Some examples are decision trees, neural networks, support vector machines or random forests. Note that software tools don't support all types of models, therefore we need to choose PMML producers and consumers which are compatible.

In our example we are studying a neural network model for regression. The neural network tag shows information like the function name, the number of layers or the activation function for each layer. It also includes all the parameter values (biases and synaptic weights). It is composed of other tags like “MiningSchema”, “NeuralInputs”, “NeuralLayer” and “NeuralOutputs”. Then, the neural network tag for that particular model is depicted next.

```	<NeuralNetwork functionName="regression" numberOfLayers="2" activationFunction="tanh">
<MiningSchema>
<MiningField name="x"/>
<MiningField name="y" usageType="predicted"/>
</MiningSchema>
<NeuralInputs numberOfInputs="1">
<NeuralInput id="0,0">
<DerivedField optype="continuous" dataType="double">
<FieldRef field="x*"/>
</DerivedField>
</NeuralInput>
</NeuralInputs>
<NeuralLayer numberOfNeurons="3" activationFunction="tanh">
<Neuron bias="-0.466" id="1,0">
<Con from="0,0" weight="0.934"/>
</Neuron>
<Neuron bias="-1.084" id="1,1">
<Con from="0,0" weight="1.116"/>
</Neuron>
<Neuron bias="-0.169" id="1,2">
<Con from="0,0" weight="0.143"/>
</Neuron>
</NeuralLayer>
<NeuralLayer numberOfNeurons="1" activationFunction="identity">
<Neuron bias="0.713" id="2,0">
<Con from="1,0" weight="0.583"/>
<Con from="1,1" weight="1.162"/>
<Con from="1,2" weight="0.261"/>
</Neuron>
</NeuralLayer>
<NeuralOutputs numberOfOutputs="1">
<NeuralOutput outputNeuron="2,0">
<DerivedField optype="continuous" dataType="double">
<FieldRef field="y*"/>
</DerivedField>
</NeuralOutput>
</NeuralOutputs>
</NeuralNetwork>
```

As we can see, the numer of layers in this neural network is 2. This model predicts "y" values as a function of "x" values. The neural network has one input, three hidden neurons with hyperbolic tangent activation function and one output neuron with identity activation function. Both the input and the output are continuous variables.

All in all, PMML works as a common denominator among different predictive analytic applications. It provides data scientists with a great flexibility, since this language connects the developers of predictive models to the final users.

Some of the most popular PMML producers are the following: On the other hand, some well known PMML consumers are: To read the official documentation about PMML, please click here.