Diagnose urinary inflammations using Neural Designer

This example aims to design a model that can diagnose acute inflammation or nephritises of the urinary bladder.

This is a medical diagnosis application.

The main idea of this algorithm is to perform a presumptive diagnosis of two diseases of the urinary system. Afterward, an expert will make a confirmatory diagnosis to verify the results.

With proper treatment, symptoms of inflammation usually decay within a few days. However, there is an inclination to relapse. In people with acute urinary bladder inflammation, we should expect that the illness will turn into a protracted form.

Contents:

  1. Application type.
  2. Data set.
  3. Neural network.
  4. Training strategy.
  5. Model selection.
  6. Testing analysis.
  7. Model deployment.

This example is solved with Neural Designer. To follow it step by step, you can use the free trial.

1. Application type

This is a classification project since the variables to be predicted are binary.

The goal here is to model the probability of nephritises and inflammation of the urinary bladder, conditioned on the patient's symptoms.

Since both variables are binary and independent, we will use the model to predict the diseases one at a time. In this example, the model will be built to predict acute inflammation of the urinary bladder.

2. Data set

The data file urinary_inflammation.csv contains 120 rows and 8 columns.

This data set contains the following variables:

As the goal is to get a model that can diagnose one of the diseases, the variable for acute nephritises diagnosis will be set as unused.

The instances are divided into training, selection, and testing subsets. They represent 60% (72), 20% (24), and 20% (24) of the original instances, respectively, and are split at random.

Before the model configuration, it is recommended to perform an analysis of the data we have. For classification projects, it is important to know the distribution of the target variable in the dataset. The following picture shows a pie chart for the inflammation_of_urinary_bladder variable.

As we can observe, the data is quite well balanced. This information will later be used to define the parameters of the neural network.

Another relevant piece of information to keep in mind is the correlation of each input with the target variable. The chart below displays this information.

From the picture above, we can conclude that the variables with a considerable influence on the target variable are urine pushing and micturition pain.

3. Neural network

The second step is to choose a neural network to represent the classification function. For classification problems, it is composed of:

For the scaling layer, the mean and standard deviation scaling method is set.

We set 2 perceptron layers, one hidden layer with 3 neurons as a first guess and one output layer with 1 neuron, both layers having the logistic activation function.

Finally, we will set the binary probabilistic method for the probabilistic layer, as we want the predicted target variable to be binary.

The following figure is a diagram for the neural network used in this example.

4. Training strategy

The fourth step is to set the training strategy, which is composed of:

The loss index chosen for this application is the normalized squared error with L2 regularization.

The learning problem can be stated as to find a neural network that minimizes the loss index, i.e., a neural network that fits the data set (error term), and that does not oscillate (regularization term).

The optimization algorithm set for the model is the quasi-Newton method.

The final training and selection errors are training error = 0.0009 WSE and selection error = 0.0008 WSE, respectively. In the next section, we will try to improve the generalization performance by reducing the selection error.

5. Model selection

The objective of model selection is to improve the generalization capabilities of the neural network or, in other words, to reduce the selection error.

Since the selection error that we have achieved so far is minimal (0.0008 WSE), we don't need to apply order selection nor input selection here.

6. Testing analysis

An exhaustive testing analysis is performed to validate the generalization performance of the trained neural network. To validate a classification model, we need to compare the values provided by this technique to the observed values.

The following table contains the elements of the confusion matrix. It contains the true positives, false positives, true negatives, and false negatives for the variable diagnosis.

Predicted positive Predicted negative
Real positive 12 0
Real negative 0 12

The number of correctly classified instances is 24, and the number of misclassified instances is 0. From this table, we can calculate the binary classification tests.

The binary classification tests are parameters for measuring the performance of a classification problem with two classes:

From the results above, we can say that the model is predicting perfectly.

7. Model deployment

The neural network is now ready to predict outputs for inputs that it has never seen.

We calculate the neural network outputs to diagnose inflammation of urinary bladder from the characteristics of a new patient. The next list shows some values for the inputs and the corresponding output for that patient.

We can export the mathematical expression of the neural network to any clinical software used for diagnosing these diseases. The expression is listed below.

scaled_temperature = (temperature-38.7242)/1.81913;
scaled_occurrence_of_nausea = (occurrence_of_nausea-0.241667)/0.429888;
scaled_lumbar_pain = (lumbar_pain-0.583333)/0.495074;
scaled_urine_pushing = (urine_pushing-0.666667)/0.473381;
scaled_micturition_pains = (micturition_pains-0.491667)/0.502027;
scaled_burning_of_urethra = (burning_of_urethra-0.416667)/0.495074;
y_1_1 = Logistic (0.202475+ (scaled_temperature*0.456805)+ (scaled_occurrence_of_nausea*-0.793165)+ (scaled_lumbar_pain*1.47038)+ (scaled_urine_pushing*-1.76085)+ (scaled_micturition_pains*-0.81808)+ (scaled_burning_of_urethra*0.145468));
y_1_2 = Logistic (-0.254725+ (scaled_temperature*-0.224772)+ (scaled_occurrence_of_nausea*0.605717)+ (scaled_lumbar_pain*-1.1313)+ (scaled_urine_pushing*1.62554)+ (scaled_micturition_pains*0.801991)+ (scaled_burning_of_urethra*-0.267477));
y_1_3 = Logistic (-0.307314+ (scaled_temperature*-0.245913)+ (scaled_occurrence_of_nausea*0.733341)+ (scaled_lumbar_pain*-1.26759)+ (scaled_urine_pushing*1.63105)+ (scaled_micturition_pains*0.83814)+ (scaled_burning_of_urethra*-0.159719));
non_probabilistic_inflammation_of_urinary_bladder = Logistic (-1.16282+ (y_1_1*-3.39269)+ (y_1_2*2.99384)+ (y_1_3*3.00634));
inflammation_of_urinary_bladder = binary(non_probabilistic_inflammation_of_urinary_bladder);

logistic(x){
   return 1/(1+exp(-x))
}

binary(x){
   if x < decision_threshold
       return 0
   else
       return 1
}
        

References:

Related examples:

Related solutions: